Add DROID action-policy SFT recipe (Cosmos3-Nano, joint_pos) by fwd4 · Pull Request #24 · NVIDIA/cosmos-framework

fwd4 · 2026-06-08T08:41:14Z

Summary

Adds a DROID action-policy SFT recipe for nvidia/Cosmos3-Nano, mirroring the internal droid_lerobot_8b policy run, so users can post-train the action-generation + action heads on DROID (LeRobot v3.0) data.

What's included

data/vfm/action/datasets/droid_lerobot_dataset.py — DROID LeRobot dataset: compact columnar load + episode-aware windowing (replaces an eager full-table materialization), plus joint_pos (8D: 7 joints + gripper) and use_state support.
data/vfm/action/datasets/action_sft_dataset.py (new) — get_action_droid_sft_dataset(...) wrapping the dataset through ActionTransformPipeline.
configs/.../action/posttrain_config/action_policy_droid_nano.py (new) — registered action_policy_droid_nano experiment (Cosmos3-Nano / 8B MoT): optimizer trains gen+action heads (5× LR on action heads), LambdaLinear schedule, count-based batch, res480, encode_exact_durations=[33] (chunk 32 → 33 frames).
checkpoint/dcp.py — EMA warm-start: when keys_to_skip_loading excludes net_ema., initialize net_ema = net from the base weights so EMA starts from the init rather than zeros.
examples/toml/sft_config/action_policy_droid_{nano,repro}.toml — 1-GPU smoke + scaled (res480) configs.
examples/launch_sft_action_policy_droid.sh + docs/action_policy_droid_posttraining.md — runnable launcher and walkthrough.

Validation

End-to-end on H200:

1 node / 8×H200 — dry-run + training at res480, max_samples_per_batch=32 (64 OOMs at 139 GiB; internal used 128 on GB200).
2 nodes / 16 ranks — HSDP shard 8 × replicate 2, TRAIN_EXIT=0.
Recipe faithful to internal droid_lerobot_8b: lr 1e-4 / betas / wd, 5× action-head LR, LambdaLinear, shift {256:3,480:5,720:10}, concat_view, chunk_length=32.

Notes

Count-based batch (max_samples_per_batch, max_sequence_length=None) lives in the experiment Python — TOML cannot express null, and the loader only overrides keys present in the TOML.
Base checkpoint: convert nvidia/Cosmos3-Nano → DCP and pass via BASE_CHECKPOINT_PATH; action heads init fresh (skipped on load).

Adds a DROID action-policy post-training recipe for nvidia/Cosmos3-Nano: - DROID LeRobot dataset: compact columnar load + episode-aware windowing, joint_pos (8D) actions + use_state proprioception. - ActionTransformPipeline wrapper (get_action_droid_sft_dataset). - Registered action_policy_droid_nano experiment + res480 repro TOML (lr 2e-4, lambdalinear, grad_clip 1.0, count-based batch, fresh action heads). - EMA warm-start in checkpoint/dcp.py (net_ema := net on warm-start init). - Docs + example launcher. Validated end-to-end on H200 (1 node/8 GPU and 2 nodes/16 ranks, HSDP). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>

foreverlms

LGTM.

Per-sample random crop+rescale (5% spatial jitter) + color jitter, applied to all camera views with shared params (temporally and cross-view consistent) before the concat-view assembly. Off by default; enabled in the action_policy_droid_nano experiment to match the reference recipe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The warm-start branch that copies net.* -> net_ema.* (so EMA starts from the loaded weights instead of random construction-time values) was gated on net_ema.* being present in the model state dict, which is always true — so it fired on every warm start and overwrote a genuinely loaded, trained net_ema (e.g. when initializing from a training checkpoint that carries its own EMA). Gate the copy on net_ema.* being explicitly listed in keys_to_skip_loading instead: only reset net_ema = net when net_ema was actually skipped on load (the HF->DCP init case). When net_ema.* is loaded, leave it intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

mli0603 · 2026-06-09T18:31:27Z

+        fps=fps,
+        chunk_length=chunk_length,
+        viewpoint=viewpoint,
+        action_space=action_space,


I see that the released Droid dataset doesn't accept action_space as an arg. Have you tried running this script with cosmos-framework?

yes, it's runnable, dataset is updated, check droid_lerobot_dataset.py

Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>

Signed-off-by: Hao Liang <haolia@nvidia.com>

…on large video batches) Signed-off-by: Hao Liang <haolia@nvidia.com>

Ports the internal use_filter_dict from the DROID policy recipe: restrict training windows to curated keep-ranges (drops idle/non-task frames), matching the reference distribution (kept 74.42% = 12.54M of 16.85M windows). Off by default; enabled via filter_dict_path (an internal data artifact, not shipped). Additive branch — the unfiltered path is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>

…ternal The OSS recipe used torch AdamW (bf16 params) + eps 1e-6, which under-steps the small 5x-lr action heads (sub-ULP updates swamped in bf16) and leaves the action loss on a noisy high plateau. Switch to FusedAdam with fp32 master_weights + eps 1e-8 to match the internal droid_lerobot_8b_policy optimizer. An exact-match forward+optimizer test (identical batch/seed, 1-GPU eager) confirmed the forward is byte-identical across repos and the convergence gap was solely the optimizer. Signed-off-by: Hao Liang <haolia@nvidia.com>

fwd4 force-pushed the action-policy-droid-sft branch from e20fbb6 to b8a1cad Compare June 8, 2026 09:33

fwd4 force-pushed the action-policy-droid-sft branch from b8a1cad to 5a9d716 Compare June 8, 2026 10:10

Xuanmeng-Zhang requested review from lfengad and ychao-nvidia June 8, 2026 10:28

fwd4 requested review from foreverlms and yuzhudong June 8, 2026 10:59

foreverlms previously approved these changes Jun 8, 2026

View reviewed changes

lfengad previously approved these changes Jun 8, 2026

View reviewed changes

lfengad and others added 2 commits June 8, 2026 23:04

Merge branch 'main' into action-policy-droid-sft

e40c96a

fwd4 dismissed stale reviews from lfengad and foreverlms via f830833 June 9, 2026 04:17

mli0603 reviewed Jun 9, 2026

View reviewed changes

ychao-nvidia reviewed Jun 10, 2026

View reviewed changes

Comment thread docs/action_policy_droid_posttrain.md

Comment thread docs/action_policy_droid_posttraining.md Outdated

Comment thread docs/action_policy_droid_posttraining.md Outdated

Comment thread docs/action_policy_droid_posttraining.md Outdated

fwd4 and others added 4 commits June 11, 2026 06:32

Update docs/action_policy_droid_posttraining.md

90c214b

Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>

Update docs/action_policy_droid_posttraining.md

6cede45

Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>

Update docs/action_policy_droid_posttraining.md

8e3e704

Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>

Update docs/action_policy_droid_posttraining.md

6270916

Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>

ychao-nvidia reviewed Jun 10, 2026

View reviewed changes

Comment thread docs/action_policy_droid_posttrain.md

fwd4 and others added 4 commits June 10, 2026 16:24

rename doc

31d3486

Signed-off-by: Hao Liang <haolia@nvidia.com>

work: opt-in file_system DataLoader IPC (avoid /dev/shm worker crash …

7493d7d

…on large video batches) Signed-off-by: Hao Liang <haolia@nvidia.com>

fwd4 removed the request for review from yuzhudong June 10, 2026 23:43

Merge branch 'main' into action-policy-droid-sft

069661d

ychao-nvidia approved these changes Jun 11, 2026

View reviewed changes

lfengad approved these changes Jun 11, 2026

View reviewed changes

lfengad merged commit bbda321 into NVIDIA:main Jun 11, 2026
7 checks passed

fwd4 mentioned this pull request Jun 12, 2026

action dataloader: episode-shuffle stream (fix DROID grad-norm instability) #37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DROID action-policy SFT recipe (Cosmos3-Nano, joint_pos)#24

Add DROID action-policy SFT recipe (Cosmos3-Nano, joint_pos)#24
lfengad merged 13 commits into
NVIDIA:mainfrom
fwd4:action-policy-droid-sft

fwd4 commented Jun 8, 2026

Uh oh!

foreverlms left a comment

Uh oh!

mli0603 Jun 9, 2026

Uh oh!

fwd4 Jun 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

fwd4 commented Jun 8, 2026

Summary

What's included

Validation

Notes

Uh oh!

foreverlms left a comment

Choose a reason for hiding this comment

Uh oh!

mli0603 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

fwd4 Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants