# ๐Ÿค– Physical AI ํ•™์Šต ๋กœ๋“œ๋งต (v2)
### MuJoCo ๊ธฐ๋ฐ˜ VLA ์—ฐ๊ตฌ โ€” 3~6๊ฐœ์›” ํ”Œ๋žœ

> **ํ™˜๊ฒฝ**: A100 GPU (์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ) ยท ๋กœ๋ด‡ ํ•˜๋“œ์›จ์–ด ์˜ˆ์‚ฐ ~100๋งŒ์›  
> **๋ฐฐ๊ฒฝ**: RL ์ด๋ก  ๊ธฐ๋ฐ˜, ์‹ค์Šต ๊ฒฝํ—˜ ์—†์Œ ยท PyTorch ๋Šฅ์ˆ™ ยท ํœด๋จธ๋…ธ์ด๋“œ/์ „์‹  ์ œ์–ด ๊ด€์‹ฌ  
> **๋ชฉํ‘œ**: Physical AI ์ „๋ฐ˜ ํƒ์ƒ‰ (VLA ๋…ผ๋ฌธ ์žฌํ˜„, ๋ฐ์ดํ„ฐ ์ƒ์„ฑ, Sim-to-Real)  
> **๋ถ€๊ฐ€ ๊ด€์‹ฌ**: World Model (์˜์ƒ ์˜ˆ์ธก ๊ธฐ๋ฐ˜ ๊ณ„ํš), LLM ๊ธฐ๋ฐ˜ Task Planning

---

## Phase 1 โ€” ๊ธฐ์ดˆ ์ฒด๋ ฅ ๋งŒ๋“ค๊ธฐ (1~4์ฃผ)

์ด ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” MuJoCo ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์— ์ต์ˆ™ํ•ด์ง€๊ณ , RL ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ง์ ‘ ๋Œ๋ ค๋ณด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

### 1-1. MuJoCo ์„ค์น˜ & ๊ธฐ๋ณธ ์กฐ์ž‘ (1์ฃผ์ฐจ)

```bash
pip install mujoco
pip install gymnasium[mujoco]

ํ•ต์‹ฌ ์ž๋ฃŒ:

1-2. RL ์ฒซ ์‹ค์Šต โ€” PPO๋กœ ๊ฑท๊ธฐ (2~3์ฃผ์ฐจ)

์ด๋ก ๋งŒ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด, ์ง์ ‘ ๋Œ๋ ค๋ณด๋Š” ๊ฒŒ ๊ฐ€์žฅ ๋น ๋ฆ…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์ž๋ฃŒ:

1-3. MuJoCo Playground ์ž…๋ฌธ (3~4์ฃผ์ฐจ)

MuJoCo Playground๋Š” MJX(JAX ๋ฐฑ์—”๋“œ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ GPU์—์„œ ์ˆ˜์ฒœ ๊ฐœ ํ™˜๊ฒฝ์„ ๋ณ‘๋ ฌ ์‹คํ–‰ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. A100์ด ์žˆ์œผ๋‹ˆ ์ด๊ฑธ ์•ˆ ์“ธ ์ด์œ ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

pip install playground

ํ•ต์‹ฌ ์ž๋ฃŒ:

Phase 1 ๋งˆ์ผ์Šคํ†ค: MuJoCo์—์„œ ํœด๋จธ๋…ธ์ด๋“œ๊ฐ€ ๊ฑธ์–ด๋‹ค๋‹ˆ๋Š” PPO ์ •์ฑ…์„ ์ง์ ‘ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.


Phase 2 โ€” VLA ํ•ต์‹ฌ ๊ฐœ๋… & ๋…ผ๋ฌธ ๊ณต๋ถ€ (5~8์ฃผ)

์ด ๋‹จ๊ณ„์—์„œ๋Š” Physical AI์˜ ํ•ต์‹ฌ์ธ VLA(Vision-Language-Action) ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์™€ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค.

2-1. VLA ์•„ํ‚คํ…์ฒ˜ ์ด๋ก  ํ•™์Šต (5~6์ฃผ์ฐจ)

VLA ๋ชจ๋ธ์€ "๋ณด๊ณ (Vision) โ†’ ์ดํ•ดํ•˜๊ณ (Language) โ†’ ํ–‰๋™ํ•˜๋Š”(Action)" ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

ํ•„์ˆ˜ ๋…ผ๋ฌธ ์ฝ๊ธฐ ์ˆœ์„œ:

์ˆœ์„œ ๋…ผ๋ฌธ ์™œ ์ค‘์š”ํ•œ๊ฐ€
1 RT-2 (Google, 2023) VLA์˜ ์‹œ์ž‘์ . VLM์„ ๋กœ๋ด‡ ์ œ์–ด์— ์ฒ˜์Œ ์—ฐ๊ฒฐ
2 Octo (UC Berkeley, 2024) ์˜คํ”ˆ์†Œ์Šค ๋ฒ”์šฉ ๋กœ๋ด‡ ์ •์ฑ…์˜ ์ดˆ๊ธฐ ๋ชจ๋ธ
3 OpenVLA (Stanford, 2024) ๊ฐ€์žฅ ์ ‘๊ทผํ•˜๊ธฐ ์‰ฌ์šด ์˜คํ”ˆ์†Œ์Šค VLA
4 ฯ€โ‚€ (Physical Intelligence, 2024) Flow matching ๊ธฐ๋ฐ˜ VLA, ํ˜„์žฌ SOTA๊ธ‰
5 ฯ€โ‚€.5 (Physical Intelligence, 2025) Open-world ์ผ๋ฐ˜ํ™”์˜ ์ตœ์ „์„ 

๋ณ‘๋ ฌ๋กœ ๊ณต๋ถ€ํ•  ๋ฐฐ๊ฒฝ ์ง€์‹:

2-2. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฒค์น˜๋งˆํฌ ํ™˜๊ฒฝ ์„ธํŒ… (6~7์ฃผ์ฐจ)

๋…ผ๋ฌธ์„ ์ฝ์œผ๋ฉฐ ๋™์‹œ์— ํ‰๊ฐ€ ํ™˜๊ฒฝ์„ ์„ธํŒ…ํ•ฉ๋‹ˆ๋‹ค.

์ด ํ™˜๊ฒฝ๋“ค์€ MuJoCo ๊ธฐ๋ฐ˜์ด๋ฏ€๋กœ Phase 1์—์„œ ์ตํžŒ ๊ฒƒ์ด ๋ฐ”๋กœ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

2-3. OpenVLA ์ง์ ‘ ๋Œ๋ ค๋ณด๊ธฐ (7~8์ฃผ์ฐจ)

git clone <https://github.com/openvla/openvla.git>
cd openvla && pip install -e .

ํ•ต์‹ฌ ์ž๋ฃŒ:

Phase 2 ๋งˆ์ผ์Šคํ†ค: VLA ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๊ณ , OpenVLA๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋Œ๋ ค๋ณผ ์ˆ˜ ์žˆ๋‹ค.


Phase 3 โ€” ฯ€โ‚€ ์žฌํ˜„ & ์‹ฌํ™” ์‹คํ—˜ (9~14์ฃผ)

์ด ๋‹จ๊ณ„๋Š” ํ˜„์žฌ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ VLA์ธ ฯ€โ‚€ ๊ณ„์—ด์„ ์ง์ ‘ ๋‹ค๋ค„๋ณด๋Š” ํ•ต์‹ฌ ๊ตฌ๊ฐ„์ž…๋‹ˆ๋‹ค.

3-1. JAX ๊ธฐ์ดˆ ํ•™์Šต (9์ฃผ์ฐจ โ€” ๋ณ‘๋ ฌ ์ง„ํ–‰)

Physical AI ์—ฐ๊ตฌ์—์„œ JAX๊ฐ€ ์ฃผ๋ฅ˜๋กœ ์ž๋ฆฌ์žก๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. MJX, OpenPI ์›๋ณธ ๊ตฌํ˜„, Brax ๋“ฑ ํ•ต์‹ฌ ๋„๊ตฌ๊ฐ€ ์ „๋ถ€ JAX ๊ธฐ๋ฐ˜์ด๋ฏ€๋กœ, PyTorch ๊ฒฝํ—˜์„ ๋ฐ”ํƒ•์œผ๋กœ 1~2์ฃผ ํˆฌ์žํ•˜๋ฉด ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๋‹ค.

PyTorch โ†” JAX ๋Œ€์‘ ํ•ต์‹ฌ:

PyTorch JAX ์„ค๋ช…
torch.no_grad() jax.jit JIT ์ปดํŒŒ์ผ๋กœ ์ž๋™ ์ตœ์ ํ™”
DataParallel jax.pmap ๋””๋ฐ”์ด์Šค ๋ณ‘๋ ฌํ™”
torch.vmap jax.vmap ์ž๋™ ๋ฐฐ์น˜ ๋ฒกํ„ฐํ™”
loss.backward() jax.grad ํ•จ์ˆ˜ํ˜• ์ž๋™ ๋ฏธ๋ถ„

3-2. OpenPI ์„ธํŒ… & ์ถ”๋ก  (9~10์ฃผ์ฐจ)

Physical Intelligence๊ฐ€ ๊ณต๊ฐœํ•œ ๊ณต์‹ ์˜คํ”ˆ์†Œ์Šค ๊ตฌํ˜„์ฒด์ž…๋‹ˆ๋‹ค.

git clone --recurse-submodules <https://github.com/Physical-Intelligence/openpi.git>

ํ•„์š” GPU: A100 80GB 1์žฅ (์ถ”๋ก ) / FSDP ๋ฉ€ํ‹ฐ GPU (ํ•™์Šต)

์‹ค์ „ ํŒ:

3-3. ์ปค์Šคํ…€ ๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ธํŠœ๋‹ (11~12์ฃผ์ฐจ)

# ฯ€โ‚€.5 LIBERO ํŒŒ์ธํŠœ๋‹ ์˜ˆ์‹œ
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_libero \\
  --exp-name=my_experiment --overwrite

3-4. MuJoCo์—์„œ ํ•™์Šต ๋ฐ์ดํ„ฐ ์ƒ์„ฑ (12~14์ฃผ์ฐจ)

VLA์˜ ํ•ต์‹ฌ ๋ณ‘๋ชฉ์€ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค. MuJoCo Playground์˜ GPU ๋ณ‘๋ ฌํ™”๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Phase 3 ๋งˆ์ผ์Šคํ†ค: ฯ€โ‚€๋ฅผ ์ง์ ‘ ํŒŒ์ธํŠœ๋‹ํ•˜๊ณ , ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.


Phase 4 โ€” ์‹ค์ œ ๋กœ๋ด‡์œผ๋กœ Sim-to-Real (15~24์ฃผ)

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ๊ฒƒ์„ ์‹ค์ œ ๋กœ๋ด‡์— ์˜ฎ๊ธฐ๋Š” ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.

4-1. ํ•˜๋“œ์›จ์–ด ์„ ํƒ (์˜ˆ์‚ฐ ~100๋งŒ์›)

100๋งŒ์› ์˜ˆ์‚ฐ์—์„œ ํ˜„์‹ค์ ์ธ ์„ ํƒ์ง€๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค.

์˜ต์…˜ A: LeRobot SO-101 ๋“€์–ผ ์•” (์ถ”์ฒœ ์‹œ์ž‘์ )

์˜ต์…˜ A์˜ ์ถ”์ฒœ ์žฅ๋น„ ๊ตฌ์„ฑ:

์˜ต์…˜ B: Unitree R1 (~$5,900, ์•ฝ 80๋งŒ์›)

์ฐธ๊ณ : Unitree G1 EDU (SDK ํฌํ•จ)๋Š” $21,600~๋กœ ์˜ˆ์‚ฐ์„ ํฌ๊ฒŒ ์ดˆ๊ณผํ•˜์ง€๋งŒ, ๋Œ€ํ•™ ์—ฐ๊ตฌ์‹ค์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€ ํ”Œ๋žซํผ์ด ๋˜์–ด๊ฐ€๊ณ  ์žˆ์œผ๋ฉฐ, MuJoCo Menagerie์˜ G1 ๋ชจ๋ธ๊ณผ 1:1 ๋Œ€์‘๋˜๋Š” ์ตœ์ ์˜ Sim-to-Real ํ”Œ๋žซํผ์ž…๋‹ˆ๋‹ค. ์ถ”ํ›„ ์˜ˆ์‚ฐ์ด ํ™•๋ณด๋˜๋ฉด ์ตœ์šฐ์„  ๊ณ ๋ ค ๋Œ€์ƒ์ž…๋‹ˆ๋‹ค.

4-2. LeRobot ํ”„๋ ˆ์ž„์›Œํฌ ํ™œ์šฉ (์˜ต์…˜ A ์„ ํƒ ์‹œ)

pip install lerobot

LeRobot์€ HuggingFace๊ฐ€ ๋งŒ๋“  ๋กœ๋ด‡ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, SO-101๊ณผ ํ•จ๊ป˜ ์“ฐ๋ฉด ๊ฐ•๋ ฅํ•ฉ๋‹ˆ๋‹ค.

4-3. Sim-to-Real ์ „์ด ์‹คํ—˜ (์‹ฌํ™”)

Sim-to-Real์€ ๋‹จ์ˆœํžˆ "์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ •์ฑ…์„ ์˜ฎ๊ธฐ๋Š” ๊ฒƒ"์ด ์•„๋‹ˆ๋ผ, ์ฒด๊ณ„์ ์ธ ๋„๋ฉ”์ธ ๊ฐญ ์ถ•์†Œ ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ๊ธฐ๋ฒ• 1 โ€” Domain Randomization (DR)

์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ๋ฌผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šตํ•  ๋•Œ๋งˆ๋‹ค ๋žœ๋คํ•˜๊ฒŒ ๋ณ€๊ฒฝํ•˜์—ฌ ์ •์ฑ…์˜ ๊ฐ•๊ฑด์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. 250ํŽธ ์ด์ƒ์˜ ๋…ผ๋ฌธ์„ ๋ถ„์„ํ•œ ์ตœ๊ทผ ์„œ๋ฒ ์ด์— ๋”ฐ๋ฅด๋ฉด, DR์€ ๊ฐ€์žฅ ์ง€๋ฐฐ์ ์ธ ์ ‘๊ทผ๋ฒ•์ด๋ฉฐ ์„ฑ๊ณต์ ์ธ ์ „์ด๋Š” ๋ณดํ†ต ์—ฌ๋Ÿฌ ๊ธฐ๋ฒ•์˜ ๊ฒฐํ•ฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋žœ๋คํ™” ๋Œ€์ƒ ํŒŒ๋ผ๋ฏธํ„ฐ:

ํ•ต์‹ฌ ๊ธฐ๋ฒ• 2 โ€” Teacher-Student ๊ตฌ์กฐ

์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์˜ privileged state(์™„๋ฒฝํ•œ ์ƒํƒœ ์ •๋ณด)์— ์ ‘๊ทผ ๊ฐ€๋Šฅํ•œ teacher ์ •์ฑ…์„ ๋จผ์ € ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ์‹ค์ œ ์„ผ์„œ ์ž…๋ ฅ๋งŒ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” student ์ •์ฑ…์œผ๋กœ ์ฆ๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. MuJoCo Playground๊ฐ€ ์ด ํŒŒ์ดํ”„๋ผ์ธ์„ ์ž˜ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

[Teacher ํ•™์Šต]  sim์˜ ์™„์ „ํ•œ ์ƒํƒœ โ†’ teacher ์ •์ฑ… (privileged)
       โ†“ ์ฆ๋ฅ˜ (DAgger ๋“ฑ)
[Student ํ•™์Šต]  ์„ผ์„œ ๊ด€์ธก + ๊ณผ๊ฑฐ ์ด๋ ฅ โ†’ student ์ •์ฑ… (deployable)
       โ†“ ONNX ๋ณ€ํ™˜
[์‹ค์ œ ๋ฐฐํฌ]    ๋กœ๋ด‡ ์„ผ์„œ โ†’ ONNX Runtime (50Hz) โ†’ ๊ด€์ ˆ ๋ช…๋ น

ํ•ต์‹ฌ ๊ธฐ๋ฒ• 3 โ€” ONNX ๋ณ€ํ™˜ & ๋ฐฐํฌ

Playground์˜ locomotion ์ •์ฑ…์€ ONNX Runtime์œผ๋กœ 50Hz์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๋ฉฐ, ROS2 ๊ธฐ๋ฐ˜ C++ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ์‹ค์ œ ๋กœ๋ด‡์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.

๋ฐฐํฌ ํŒŒ์ดํ”„๋ผ์ธ:

  1. ํ•™์Šต๋œ ์ •์ฑ…์„ ONNX ํฌ๋งท์œผ๋กœ ๋ณ€ํ™˜
  2. ROS2 ๋…ธ๋“œ์—์„œ ONNX Runtime์œผ๋กœ ์ถ”๋ก  ์‹คํ–‰
  3. ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ƒํƒœ ์ถ”์ •๊ธฐ๊ฐ€ 500~2000Hz๋กœ ์„ผ์„œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
  4. ์ •์ฑ… ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ PD ์ปจํŠธ๋กค๋Ÿฌ๋กœ ๊ด€์ ˆ ํ† ํฌ ๋ณ€ํ™˜

Sim-to-Real ํ•ต์‹ฌ ์ž๋ฃŒ:

Phase 4 ๋งˆ์ผ์Šคํ†ค: ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ•™์Šตํ•œ ์ •์ฑ…์ด ์‹ค์ œ ๋กœ๋ด‡์—์„œ ๋™์ž‘ํ•œ๋‹ค.


Phase 5 โ€” ๋ถ€๊ฐ€ ํ•™์Šต: World Model (๋ณ‘๋ ฌ ์ง„ํ–‰ ๊ฐ€๋Šฅ)

World Model์€ ํ˜„์žฌ Physical AI์—์„œ ๊ฐ€์žฅ ๋œจ๊ฑฐ์šด ๋ถ„์•ผ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” "๋ฏธ๋ž˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋น„๋””์˜ค ๋ชจ๋ธ์„ ๋กœ๋ด‡ ์ •์ฑ…์œผ๋กœ ์“ธ ์ˆ˜ ์žˆ๋‹ค"๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Phase 2~3 ๊ธฐ๊ฐ„์— ๋…ผ๋ฌธ ๊ณต๋ถ€๋ฅผ ์‹œ์ž‘ํ•˜๊ณ , Phase 3~4์—์„œ ์‹ค์Šต์„ ๋ณ‘ํ–‰ํ•˜์„ธ์š”.

5-1. ์ด๋ก  ๊ธฐ๋ฐ˜ ๋‹ค์ง€๊ธฐ

๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ ๊ธฐ์ดˆ (Phase 2์™€ ๋ณ‘๋ ฌ)

World Model์„ ์ดํ•ดํ•˜๋ ค๋ฉด ๋น„๋””์˜ค ์ƒ์„ฑ์˜ ๊ธฐ์ดˆ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:

ํ•„์ˆ˜ ๋…ผ๋ฌธ:

์ˆœ์„œ ๋…ผ๋ฌธ ํ•ต์‹ฌ ๋‚ด์šฉ
1 Dreamer v3 (Hafner et al., 2023) ํ•™์Šต๋œ ์„ธ๊ณ„ ๋ชจ๋ธ๋กœ RL โ€” ๊ธฐ๋ณธ ํŒจ๋Ÿฌ๋‹ค์ž„
2 COSMOS 1.0 (NVIDIA, 2025) ๋Œ€๊ทœ๋ชจ ๋น„๋””์˜ค WFM์˜ ์„ค๊ณ„์™€ ํ•™์Šต ๋ฐฉ๋ฒ•
3 Cosmos-Predict2.5 (NVIDIA, 2025) Text/Image/Video โ†’ World ํ†ตํ•ฉ ๋ชจ๋ธ
4 Cosmos Policy (NVIDIA, 2026) ์„ธ๊ณ„ ๋ชจ๋ธ์„ ๋กœ๋ด‡ ์ •์ฑ…์œผ๋กœ ์ง์ ‘ ํ™œ์šฉ

5-2. ์‹ค์Šต: Cosmos ๋‹ค๋ค„๋ณด๊ธฐ (Phase 3~4์™€ ๋ณ‘๋ ฌ)

NVIDIA Cosmos๋Š” ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœ๋˜์–ด A100์—์„œ ์‹คํ—˜ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

# Cosmos-Predict2.5 ์„ค์น˜
pip install cosmos-predict2
# ๋˜๋Š” GitHub์—์„œ ์ง์ ‘ ํด๋ก 
git clone <https://github.com/nvidia-cosmos/cosmos-predict2.5.git>

์‹ค์Šต ๊ฒฝ๋กœ:

  1. Cosmos-Predict2.5 2B ๋ชจ๋ธ๋กœ Text2World / Video2World ์ถ”๋ก  ์ฒดํ—˜
  2. Bridge ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ action-conditioned post-training ํŠœํ† ๋ฆฌ์–ผ ๋”ฐ๋ผํ•˜๊ธฐ
  3. DROID ๋˜๋Š” LIBERO ๋ฐ์ดํ„ฐ๋กœ ๋กœ๋ด‡ ๋„๋ฉ”์ธ post-training
  4. Cosmos-Transfer2.5๋กœ Sim2Real ์ด๋ฏธ์ง€ ๋ณ€ํ™˜ ์‹คํ—˜

ํ•ต์‹ฌ ์ž๋ฃŒ:

5-3. World Model๊ณผ VLA์˜ ํ†ตํ•ฉ (Phase 4 ์ดํ›„)

World Model๊ณผ VLA๊ฐ€ ํ•ฉ์ณ์ง€๋Š” ๊ฒƒ์ด Physical AI์˜ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค:


Phase 6 โ€” ๋ถ€๊ฐ€ ํ•™์Šต: LLM ๊ธฐ๋ฐ˜ Task Planning (๋ณ‘๋ ฌ ์ง„ํ–‰ ๊ฐ€๋Šฅ)

VLA๊ฐ€ "์–ด๋–ป๊ฒŒ ์›€์ง์ผ๊นŒ"๋ฅผ ๋‹ด๋‹นํ•œ๋‹ค๋ฉด, LLM Task Planning์€ "๋ญ˜ ํ•ด์•ผ ํ• ๊นŒ"๋ฅผ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋‘˜์€ ๊ณ„์ธต์ ์œผ๋กœ ๊ฒฐํ•ฉ๋ฉ๋‹ˆ๋‹ค. Phase 2~3์—์„œ ๋…ผ๋ฌธ์„ ์ฝ๊ณ , Phase 4์—์„œ ์‹ค์Šต๊ณผ ์—ฐ๊ฒฐํ•˜์„ธ์š”.

6-1. ํ•ต์‹ฌ ๋…ผ๋ฌธ ์ฝ๊ธฐ ์ˆœ์„œ

์ˆœ์„œ ๋…ผ๋ฌธ ํ•ต์‹ฌ ์•„์ด๋””์–ด
1 SayCan (Google, 2022) LLM ํ™•๋ฅ  ร— Affordance ํ™•๋ฅ  = ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๊ณ„ํš
2 Inner Monologue (Google, 2022) ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ์œผ๋กœ closed-loop ๊ณ„ํš (SayCan ํ™•์žฅ)
3 Code as Policies (Google, 2022) LLM์ด ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋กœ๋ด‡ ์ง์ ‘ ์ œ์–ด
4 VoxPoser (Stanford, 2023) LLM + 3D Value Map์œผ๋กœ ์กฐ์ž‘ ๊ณ„ํš
5 SPCA Framework (2026) LLM ์ œ์•ˆ + PDDL ์‹ฌ๋ณผ๋ฆญ ๊ฒ€์ฆ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ

ํŒจ๋Ÿฌ๋‹ค์ž„ ์ดํ•ด:

[System 2: LLM Task Planner]  "์ฃผ๋ฐฉ์„ ์ •๋ฆฌํ•ด์ค˜"
       โ†“ ํ•˜์œ„ ํƒœ์Šคํฌ ๋ถ„ํ•ด
"1. ์ ‘์‹œ๋ฅผ ์ฐพ์•„ โ†’ 2. ์ ‘์‹œ๋ฅผ ์ง‘์–ด โ†’ 3. ์‹ฑํฌ๋Œ€๋กœ ์˜ฎ๊ฒจ โ†’ 4. ๋‹ค์Œ ์ ‘์‹œ..."
       โ†“ ๊ฐ ๋‹จ๊ณ„๋งˆ๋‹ค
[System 1: VLA Policy]  ๋น„์ „ ์ž…๋ ฅ โ†’ ๊ด€์ ˆ ๋ช…๋ น (50Hz)
       โ†‘ ์„ฑ๊ณต/์‹คํŒจ ํ”ผ๋“œ๋ฐฑ
[Inner Monologue]  "์ ‘์‹œ ์ง‘๊ธฐ ์‹คํŒจ โ†’ ๊ฐ๋„ ๋ฐ”๊ฟ”์„œ ์žฌ์‹œ๋„"

์ด ๊ณ„์ธต ๊ตฌ์กฐ๊ฐ€ Figure AI์˜ Helix, NVIDIA GR00T, ฯ€โ‚€.5๊ฐ€ ์ง€ํ–ฅํ•˜๋Š” ๋ฐฉํ–ฅ์ž…๋‹ˆ๋‹ค.

6-2. ์‹ค์Šต: LLM + VLA 2-Tier ์‹œ์Šคํ…œ ๊ตฌ์ถ•

MuJoCo ํ™˜๊ฒฝ์—์„œ ์ง์ ‘ ๋งŒ๋“ค์–ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฏธ๋‹ˆ ํ”„๋กœ์ ํŠธ ์•„์ด๋””์–ด:

  1. LIBERO ํ™˜๊ฒฝ์—์„œ GPT-4o/Claude API๋กœ ๊ณ ์ˆ˜์ค€ ๊ณ„ํš ์ƒ์„ฑ
  2. ๊ฐ ํ•˜์œ„ ํƒœ์Šคํฌ๋ฅผ OpenVLA ๋˜๋Š” ฯ€โ‚€๊ฐ€ ์‹คํ–‰
  3. ์„ฑ๊ณต/์‹คํŒจ๋ฅผ ํƒ์ง€ํ•˜์—ฌ LLM์— ํ”ผ๋“œ๋ฐฑ โ†’ ์žฌ๊ณ„ํš (Inner Monologue ํŒจํ„ด)
  4. ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„
# ์˜์‚ฌ ์ฝ”๋“œ: LLM + VLA ํ†ตํ•ฉ ๋ฃจํ”„
plan = llm.generate_plan("Clean up the kitchen table", scene_description)
for step in plan:
    while not success:
        action = vla_policy.infer(camera_image, step.instruction)
        obs, reward, done = env.step(action)
        success = success_detector(obs, step.goal)
        if not success and timeout:
            feedback = describe_scene(obs)
            plan = llm.replan(step, feedback)  # Inner Monologue
            break

ํ•ต์‹ฌ ์ž๋ฃŒ:


๊ธฐ๋ฐ˜ ๊ธฐ์ˆ  ์Šคํƒ โ€” ๋ณ‘๋ ฌ ํ•™์Šต ๊ฐ€์ด๋“œ

์•„๋ž˜ ๊ธฐ์ˆ ๋“ค์€ ํŠน์ • Phase์— ๊ตญํ•œ๋˜์ง€ ์•Š๊ณ , ๋กœ๋“œ๋งต ์ „์ฒด์— ๊ฑธ์ณ ํ•„์š”ํ•  ๋•Œ ํ•™์Šตํ•˜์„ธ์š”.

Flow Matching (ํ•„์ˆ˜ โ€” Phase 2์—์„œ ์‹œ์ž‘)

ฯ€โ‚€์˜ ํ•ต์‹ฌ์ด์ž, Cosmos World Model์˜ ๊ธฐ๋ฐ˜ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

JAX (๊ถŒ์žฅ โ€” Phase 3์—์„œ ์‹œ์ž‘)

Physical AI ์—ฐ๊ตฌ์˜ ์‚ฌ์‹ค์ƒ ํ‘œ์ค€์ด ๋˜์–ด๊ฐ€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ROS2 (Phase 4์—์„œ ํ•„์š”)

์‹ค์ œ ๋กœ๋ด‡ ๋ฐฐํฌ ์‹œ ํ•„์ˆ˜์ ์ธ ๋ฏธ๋“ค์›จ์–ด์ž…๋‹ˆ๋‹ค.

URDF/MJCF ๋ชจ๋ธ๋ง (Phase 1์—์„œ ์‹œ์ž‘, ์ง€์†)

๋กœ๋ด‡ ๋ชจ๋ธ์˜ XML ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๋ฉด ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์ด ๊นŠ์–ด์ง‘๋‹ˆ๋‹ค.


์ถ”์ฒœ ํ•™์Šต ์ž๋ฃŒ ๋ชจ์Œ

ํ•„์ˆ˜ GitHub ๋ ˆํฌ

๋ ˆํฌ ์šฉ๋„
google-deepmind/mujoco_menagerie ๋กœ๋ด‡ ๋ชจ๋ธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
google-deepmind/mujoco + Playground GPU ๋ณ‘๋ ฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
Physical-Intelligence/openpi ฯ€โ‚€/ฯ€โ‚€.5 ๊ณต์‹ ๊ตฌํ˜„
openvla/openvla OpenVLA ๊ณต์‹ ๊ตฌํ˜„
huggingface/lerobot ๋กœ๋ด‡ ํ•™์Šต ํ†ตํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ
nvidia-cosmos/cosmos-predict2.5 World Foundation Model
nvidia-cosmos/cosmos-predict2 World Model (post-training ๊ฐ€์ด๋“œ ํฌํ•จ)
keon/awesome-physical-ai VLA ๋…ผ๋ฌธ ํ๋ ˆ์ด์…˜ ๋ฆฌ์ŠคํŠธ
vwxyzjn/cleanrl RL ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ต์œก์šฉ ๊ตฌํ˜„
allenzren/open-pi-zero ฯ€โ‚€ ์ปค๋ฎค๋‹ˆํ‹ฐ ์žฌ๊ตฌํ˜„ (๊ต์œก์šฉ)

ํ•ต์‹ฌ ๋…ผ๋ฌธ ์ „์ฒด ๋ฆฌ์ŠคํŠธ (์ฝ๊ธฐ ์ˆœ์„œ)

๊ธฐ์ดˆ (Phase 1~2):

  1. PPO ์› ๋…ผ๋ฌธ (Schulman et al., 2017) โ€” RL ๊ธฐ์ดˆ
  2. RT-2 (Google, 2023) โ€” VLA์˜ ์‹œ์ž‘
  3. Octo (UC Berkeley, 2024) โ€” ์˜คํ”ˆ์†Œ์Šค ๋ฒ”์šฉ ๋กœ๋ด‡ ์ •์ฑ…
  4. Flow Matching (Lipman et al., 2023) โ€” ฯ€โ‚€์˜ ์ˆ˜ํ•™์  ๊ธฐ๋ฐ˜

ํ•ต์‹ฌ (Phase 2~3): 5. OpenVLA (Stanford, 2024) โ€” ์˜คํ”ˆ์†Œ์Šค VLA 6. ฯ€โ‚€ (Physical Intelligence, 2024) โ€” Flow matching VLA 7. ฯ€โ‚€.5 (Physical Intelligence, 2025) โ€” Open-world ์ผ๋ฐ˜ํ™” 8. ACT (Zhao et al., 2023) โ€” Action Chunking with Transformers

๋ถ€๊ฐ€: World Model (Phase 5): 9. Dreamer v3 (Hafner et al., 2023) โ€” ํ•™์Šต๋œ ์„ธ๊ณ„ ๋ชจ๋ธ๋กœ RL 10. COSMOS (NVIDIA, 2025) โ€” ๋Œ€๊ทœ๋ชจ ์„ธ๊ณ„ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ 11. Cosmos Policy (NVIDIA, 2026) โ€” ์„ธ๊ณ„ ๋ชจ๋ธ์˜ ์ •์ฑ… ํ™œ์šฉ

๋ถ€๊ฐ€: LLM Planning (Phase 6): 12. SayCan (Google, 2022) โ€” LLM + Affordance 13. Inner Monologue (Google, 2022) โ€” Closed-loop ํ”ผ๋“œ๋ฐฑ 14. Code as Policies (Google, 2022) โ€” ์ฝ”๋“œ ์ƒ์„ฑ ์ œ์–ด

์ปค๋ฎค๋‹ˆํ‹ฐ


์ „์ฒด ํƒ€์ž„๋ผ์ธ ์š”์•ฝ

์ฃผ์ฐจ     Phase 1      Phase 2      Phase 3      Phase 4         Phase 5      Phase 6
       ๊ธฐ์ดˆ ์ฒด๋ ฅ     VLA ์ด๋ก      ฯ€โ‚€ ์‹ฌํ™”     Sim-to-Real     World Model  LLM Planning
 1  โ–ˆโ–ˆ MuJoCo
 2  โ–ˆโ–ˆ PPO
 3  โ–ˆโ–ˆ PPO
 4  โ–ˆโ–ˆ Playground
 5            โ–ˆโ–ˆ VLA ๋…ผ๋ฌธ                                    โ–‘โ–‘ ๋…ผ๋ฌธ ์ฝ๊ธฐ
 6            โ–ˆโ–ˆ VLA ๋…ผ๋ฌธ                                    โ–‘โ–‘ ๋…ผ๋ฌธ ์ฝ๊ธฐ  โ–‘โ–‘ ๋…ผ๋ฌธ ์ฝ๊ธฐ
 7            โ–ˆโ–ˆ ๋ฒค์น˜๋งˆํฌ                                                  โ–‘โ–‘ ๋…ผ๋ฌธ ์ฝ๊ธฐ
 8            โ–ˆโ–ˆ OpenVLA
 9                       โ–ˆโ–ˆ JAX+OpenPI                      โ–‘โ–‘ Cosmos
10                       โ–ˆโ–ˆ OpenPI                          โ–‘โ–‘ Cosmos
11                       โ–ˆโ–ˆ ํŒŒ์ธํŠœ๋‹
12                       โ–ˆโ–ˆ ๋ฐ์ดํ„ฐ์ƒ์„ฑ                       โ–‘โ–‘ post-train
13                       โ–ˆโ–ˆ ๋ฐ์ดํ„ฐ์ƒ์„ฑ
14                       โ–ˆโ–ˆ ๋ฐ์ดํ„ฐ์ƒ์„ฑ
15                                    โ–ˆโ–ˆ ๋กœ๋ด‡ ์„ธํŒ…                        โ–‘โ–‘ ์‹ค์Šต ์ค€๋น„
16                                    โ–ˆโ–ˆ LeRobot
17                                    โ–ˆโ–ˆ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
18                                    โ–ˆโ–ˆ ์ „์ด ์‹คํ—˜
19                                    โ–ˆโ–ˆ DR ์‹คํ—˜           โ–‘โ–‘ Sim2Real
20                                    โ–ˆโ–ˆ Teacher-Student
21-24                                 โ–ˆโ–ˆ ๋ฐ˜๋ณต/์‹ฌํ™”          โ–‘โ–‘ ํ†ตํ•ฉ        โ–‘โ–‘ LLM+VLA

โ–ˆโ–ˆ = ๋ฉ”์ธ ํ•™์Šต โ–‘โ–‘ = ๋ณ‘๋ ฌ ํ•™์Šต (๊ฐ€๋ณ๊ฒŒ)


์ตœ์ข… ํ†ตํ•ฉ: Physical AI ํ’€ ํŒŒ์ดํ”„๋ผ์ธ

๋ชจ๋“  ํ•™์Šต์ด ์ˆ˜๋ ดํ•˜๋Š” ์ตœ์ข… ๋ชฉํ‘œ ์‹œ์Šคํ…œ์˜ ๋ชจ์Šต์ž…๋‹ˆ๋‹ค.

์‚ฌ์šฉ์ž: "์ฃผ๋ฐฉ์„ ์ •๋ฆฌํ•ด์ค˜"
       โ†“
[LLM Task Planner] โ€” ๊ณ ์ˆ˜์ค€ ๊ณ„ํš ์ƒ์„ฑ (Phase 6)
  "1. ํ…Œ์ด๋ธ” ์œ„ ์ ‘์‹œ ์ฐพ๊ธฐ  2. ์ ‘์‹œ ์ง‘๊ธฐ  3. ์‹ฑํฌ๋Œ€๋กœ ์ด๋™  4. ๋‚ด๋ ค๋†“๊ธฐ"
       โ†“
[World Model] โ€” ๊ฐ ํ–‰๋™์˜ ๊ฒฐ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Phase 5)
  "์ ‘์‹œ๋ฅผ ์ด ๊ฐ๋„๋กœ ์ง‘์œผ๋ฉด โ†’ ๋ฏธ๋„๋Ÿฌ์ง ์˜ˆ์ธก โ†’ ๋‹ค๋ฅธ ๊ฐ๋„ ์„ ํƒ"
       โ†“
[VLA Policy (ฯ€โ‚€)] โ€” ์ €์ˆ˜์ค€ ๋ชจํ„ฐ ๋ช…๋ น ์ƒ์„ฑ @ 50Hz (Phase 2-3)
  ์นด๋ฉ”๋ผ ์ด๋ฏธ์ง€ + ์–ธ์–ด ๋ช…๋ น โ†’ ๊ด€์ ˆ ํ† ํฌ
       โ†“
[์‹ค์ œ ๋กœ๋ด‡] โ€” Sim-to-Real๋กœ ๋ฐฐํฌ๋œ ์ •์ฑ… ์‹คํ–‰ (Phase 4)
  ONNX Runtime + ROS2 + SO-101/G1
       โ†‘ ํ”ผ๋“œ๋ฐฑ
[Inner Monologue] โ€” ์‹คํŒจ ๊ฐ์ง€ โ†’ LLM์— ์žฌ๊ณ„ํš ์š”์ฒญ (Phase 6)

๋‹ค์Œ ๋‹จ๊ณ„ ์ œ์•ˆ

์ด ๋กœ๋“œ๋งต์„ ๋งˆ์นœ ํ›„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ: