๐Ÿ“š ๋„๋ฉ”์ธ๋ณ„ ๋…ผ๋ฌธ ์ •๋ฆฌ (Total Summary)

1. Foundation Models & VLA (๋กœ๋ด‡์˜ ๋‘๋‡Œ: ์‹œ๊ฐ-์–ธ์–ด-ํ–‰๋™ ์—ฐ๊ฒฐ)

๊ฑฐ๋Œ€์–ธ์–ด๋ชจ๋ธ(LLM)์ด ์–ด๋–ป๊ฒŒ ๋กœ๋ด‡์˜ ๋ˆˆ(Vision)๊ณผ ์†(Action)์„ ์ œ์–ดํ•˜๊ฒŒ ๋˜๋Š”์ง€๋ฅผ ๋‹ค๋ฃจ๋Š” ๊ฐ€์žฅ ํ•ต์‹ฌ์ ์ธ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ๋ช… ๊ธฐ๊ด€ ํ•ต์‹ฌ ๋‚ด์šฉ ๋น„๊ณ 
RT-2 Google DeepMind VLA(Vision-Language-Action) ๊ฐœ๋…์˜ ์‹œ์ดˆ. LLM์ด ๋กœ๋ด‡ ์ œ์–ด ํ† ํฐ์„ ์ถœ๋ ฅ. ํ•„๋… (๊ธฐ๋ณธ์„œ)
OpenVLA Stanford/Berkeley RT-2์˜ ์˜คํ”ˆ์†Œ์Šค ๋ฒ„์ „. Llama 2 + SigLIP ๊ฒฐํ•ฉ. ํ›จ์”ฌ ๊ฐ€๋ณ๊ณ  ๋น ๋ฆ„. ์‹ค๋ฌด ํ•„๋…
Octo Berkeley/Stanford ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋””ํ“จ์ „์„ ๊ฒฐํ•ฉํ•œ ์˜คํ”ˆ์†Œ์Šค ๋ฒ”์šฉ ๋กœ๋ด‡ ์ •์ฑ…. ํ‘œ์ค€ ๋ฒ ์ด์Šค๋ผ์ธ
MobileVLM / VILA NVIDIA ์—ฃ์ง€ ๋””๋ฐ”์ด์Šค(๋กœ๋ด‡ ๋‚ด๋ถ€) ํƒ‘์žฌ๋ฅผ ์œ„ํ•œ ๊ฒฝ๋Ÿ‰ํ™”๋œ VLM ๋ชจ๋ธ. ์˜จ๋””๋ฐ”์ด์Šค AI

2. Action Generation & Policy (๋กœ๋ด‡์˜ ๋ชธ: ์›€์ง์ž„ ์ƒ์„ฑ ๋ฐฉ์‹)

"์–ด๋–ป๊ฒŒ ์›€์ง์ผ๊นŒ?"์— ๋Œ€ํ•œ ๋Œ€๋‹ต์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ์ œ์–ด ๋ฐฉ์‹์—์„œ ์ƒ์„ฑํ˜• AI ๋ฐฉ์‹(Diffusion)์œผ๋กœ ๋„˜์–ด๊ฐ€๋Š” ํ๋ฆ„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋…ผ๋ฌธ๋ช… ๊ธฐ๊ด€ ํ•ต์‹ฌ ๋‚ด์šฉ ๋น„๊ณ 
Diffusion Policy Columbia/MIT ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์›๋ฆฌ(Diffusion)๋ฅผ ๋กœ๋ด‡ ํ–‰๋™ ์ƒ์„ฑ์— ์ ์šฉ. ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ถ„ํฌ ํ•™์Šต์— ํƒ์›”. ํ˜„์žฌ ํ‘œ์ค€ (Standard)
Open X-Embodiment Google et al. "๋กœ๋ด‡๊ณ„์˜ ImageNet". ์„œ๋กœ ๋‹ค๋ฅธ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉ(RT-X)ํ•˜์—ฌ ๋ฒ”์šฉ์„ฑ ์ฆ๋ช…. ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ๋ง

3. Data Collection & Sim2Real (ํ•™์Šต์˜ ์—ฐ๋ฃŒ: ๋ฐ์ดํ„ฐ์™€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜)

ํ˜„์‹ค ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ์–ด๋ ค์›€์„ '๊ธฐ๋ฐœํ•œ ํ•˜๋“œ์›จ์–ด'๋‚˜ '์‹œ๋ฎฌ๋ ˆ์ด์…˜'์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ์ ‘๊ทผ๋ฒ•์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ๋ช… ๊ธฐ๊ด€ ํ•ต์‹ฌ ๋‚ด์šฉ ๋น„๊ณ 
UMI Stanford GoPro์™€ ๊ทธ๋ฆฝํผ๋งŒ์œผ๋กœ ์ „ ์„ธ๊ณ„ ์–ด๋””์„œ๋‚˜ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๊ฐ€๋Šฅ. ๋ฐ์ดํ„ฐ ํ˜๋ช…
DexCap Stanford ๋ชจ์…˜ ์บก์ฒ˜ ์žฅ๊ฐ‘์„ ์ด์šฉํ•ด ์ •๊ตํ•œ ์†๋™์ž‘(Dexterous Hand) ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘. ํœด๋จธ๋…ธ์ด๋“œ ์† ์ œ์–ด
Eureka NVIDIA LLM(GPT-4)์ด ๊ฐ•ํ™”ํ•™์Šต ๋ณด์ƒ ํ•จ์ˆ˜(Reward Function) ์ฝ”๋“œ๋ฅผ ์ง์ ‘ ์ž‘์„ฑ. ์ž๋™ํ™”๋œ ํ•™์Šต ์„ค๊ณ„
DrEureka NVIDIA ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๊นŒ์ง€ LLM์ด ์กฐ์ ˆํ•˜์—ฌ Sim-to-Real ์„ฑ๊ณต. Eureka์˜ ํ›„์†

4. Future Trends (์ฐจ์„ธ๋Œ€ ๊ธฐ์ˆ : ์›”๋“œ ๋ชจ๋ธ & ํœด๋จธ๋…ธ์ด๋“œ)

๋‹จ์ˆœ ์ œ์–ด๋ฅผ ๋„˜์–ด, ์„ธ์ƒ์„ ์ดํ•ดํ•˜๊ณ  ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜ ์ธ๊ฐ„์„ ๋‹ฎ์•„๊ฐ€๋Š” ์ตœ์‹  ์—ฐ๊ตฌ๋“ค์ž…๋‹ˆ๋‹ค.

๋…ผ๋ฌธ๋ช… ๊ธฐ๊ด€ ํ•ต์‹ฌ ๋‚ด์šฉ ๋น„๊ณ 
Genie Google DeepMind ๋น„๋””์˜ค๋งŒ ๋ณด๊ณ  ํ•™์Šตํ•˜์—ฌ ์ •์  ์ด๋ฏธ์ง€๋ฅผ 'ํ”Œ๋ ˆ์ด ๊ฐ€๋Šฅํ•œ ๊ฒŒ์ž„'์œผ๋กœ ๋ณ€ํ™˜. World Model์˜ ์‹œ์ž‘
V-JEPA Meta FAIR ์–€ ๋ฅด์ฟค์˜ ๋น„์ „. ๋น„๋””์˜ค์˜ ํ”ฝ์…€์ด ์•„๋‹Œ ๋ฌผ๋ฆฌ์  ํŠน์ง•(Feature)์„ ์˜ˆ์ธก. ํšจ์œจ์  ํ•™์Šต
HumanPlus Stanford ์นด๋ฉ”๋ผ ํ•˜๋‚˜๋กœ ์‚ฌ๋žŒ์„ ์„€๋„์ž‰ํ•˜์—ฌ ํœด๋จธ๋…ธ์ด๋“œ ์›€์ง์ž„ ํ•™์Šต. ํœด๋จธ๋…ธ์ด๋“œ ์ œ์–ด
Pi0 / GR00T Physical Int. / NVIDIA (๋…ผ๋ฌธ/๋ฆฌํฌํŠธ) ๋‹ค์–‘ํ•œ ํ•˜๋“œ์›จ์–ด๋ฅผ ์•„์šฐ๋ฅด๋Š” ๋ฒ”์šฉ ๋กœ๋ด‡ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ํ”„๋กœ์ ํŠธ. ์‚ฐ์—…๊ณ„ ์ตœ์ „์„ 

๐Ÿš€ ์ถ”์ฒœ ํ•™์Šต ๋กœ๋“œ๋งต (Reading Order)

๊ฐ€์žฅ ํšจ์œจ์ ์œผ๋กœ ์ด ๋ถ„์•ผ๋ฅผ ๋งˆ์Šคํ„ฐํ•˜๊ธฐ ์œ„ํ•œ ์ˆœ์„œ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.

Step 1. ๊ฐœ๋… ์žก๊ธฐ (The Basics)