Skip to content

Add DROID action-policy SFT recipe (Cosmos3-Nano, joint_pos)#24

Merged
lfengad merged 13 commits into
NVIDIA:mainfrom
fwd4:action-policy-droid-sft
Jun 11, 2026
Merged

Add DROID action-policy SFT recipe (Cosmos3-Nano, joint_pos)#24
lfengad merged 13 commits into
NVIDIA:mainfrom
fwd4:action-policy-droid-sft

Conversation

@fwd4

@fwd4 fwd4 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a DROID action-policy SFT recipe for nvidia/Cosmos3-Nano, mirroring the internal droid_lerobot_8b policy run, so users can post-train the action-generation + action heads on DROID (LeRobot v3.0) data.

What's included

  • data/vfm/action/datasets/droid_lerobot_dataset.py — DROID LeRobot dataset: compact columnar load + episode-aware windowing (replaces an eager full-table materialization), plus joint_pos (8D: 7 joints + gripper) and use_state support.
  • data/vfm/action/datasets/action_sft_dataset.py (new) — get_action_droid_sft_dataset(...) wrapping the dataset through ActionTransformPipeline.
  • configs/.../action/posttrain_config/action_policy_droid_nano.py (new) — registered action_policy_droid_nano experiment (Cosmos3-Nano / 8B MoT): optimizer trains gen+action heads (5× LR on action heads), LambdaLinear schedule, count-based batch, res480, encode_exact_durations=[33] (chunk 32 → 33 frames).
  • checkpoint/dcp.py — EMA warm-start: when keys_to_skip_loading excludes net_ema., initialize net_ema = net from the base weights so EMA starts from the init rather than zeros.
  • examples/toml/sft_config/action_policy_droid_{nano,repro}.toml — 1-GPU smoke + scaled (res480) configs.
  • examples/launch_sft_action_policy_droid.sh + docs/action_policy_droid_posttraining.md — runnable launcher and walkthrough.

Validation

End-to-end on H200:

  • 1 node / 8×H200 — dry-run + training at res480, max_samples_per_batch=32 (64 OOMs at 139 GiB; internal used 128 on GB200).
  • 2 nodes / 16 ranks — HSDP shard 8 × replicate 2, TRAIN_EXIT=0.
  • Recipe faithful to internal droid_lerobot_8b: lr 1e-4 / betas / wd, 5× action-head LR, LambdaLinear, shift {256:3,480:5,720:10}, concat_view, chunk_length=32.

Notes

  • Count-based batch (max_samples_per_batch, max_sequence_length=None) lives in the experiment Python — TOML cannot express null, and the loader only overrides keys present in the TOML.
  • Base checkpoint: convert nvidia/Cosmos3-Nano → DCP and pass via BASE_CHECKPOINT_PATH; action heads init fresh (skipped on load).

@fwd4 fwd4 force-pushed the action-policy-droid-sft branch from e20fbb6 to b8a1cad Compare June 8, 2026 09:33
Adds a DROID action-policy post-training recipe for nvidia/Cosmos3-Nano:
- DROID LeRobot dataset: compact columnar load + episode-aware windowing,
  joint_pos (8D) actions + use_state proprioception.
- ActionTransformPipeline wrapper (get_action_droid_sft_dataset).
- Registered action_policy_droid_nano experiment + res480 repro TOML
  (lr 2e-4, lambdalinear, grad_clip 1.0, count-based batch, fresh action heads).
- EMA warm-start in checkpoint/dcp.py (net_ema := net on warm-start init).
- Docs + example launcher.

Validated end-to-end on H200 (1 node/8 GPU and 2 nodes/16 ranks, HSDP).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Hao Liang <haolia@nvidia.com>
@fwd4 fwd4 force-pushed the action-policy-droid-sft branch from b8a1cad to 5a9d716 Compare June 8, 2026 10:10
@fwd4 fwd4 requested review from foreverlms and yuzhudong June 8, 2026 10:59
foreverlms
foreverlms previously approved these changes Jun 8, 2026

@foreverlms foreverlms left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

lfengad
lfengad previously approved these changes Jun 8, 2026
lfengad and others added 2 commits June 8, 2026 23:04
Per-sample random crop+rescale (5% spatial jitter) + color jitter, applied
to all camera views with shared params (temporally and cross-view consistent)
before the concat-view assembly. Off by default; enabled in the
action_policy_droid_nano experiment to match the reference recipe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@fwd4 fwd4 dismissed stale reviews from lfengad and foreverlms via f830833 June 9, 2026 04:17
The warm-start branch that copies net.* -> net_ema.* (so EMA starts from the
loaded weights instead of random construction-time values) was gated on
net_ema.* being present in the model state dict, which is always true — so it
fired on every warm start and overwrote a genuinely loaded, trained net_ema
(e.g. when initializing from a training checkpoint that carries its own EMA).

Gate the copy on net_ema.* being explicitly listed in keys_to_skip_loading
instead: only reset net_ema = net when net_ema was actually skipped on load
(the HF->DCP init case). When net_ema.* is loaded, leave it intact.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fps=fps,
chunk_length=chunk_length,
viewpoint=viewpoint,
action_space=action_space,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the released Droid dataset doesn't accept action_space as an arg. Have you tried running this script with cosmos-framework?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's runnable, dataset is updated, check droid_lerobot_dataset.py

Comment thread docs/action_policy_droid_posttrain.md
Comment thread docs/action_policy_droid_posttraining.md Outdated
Comment thread docs/action_policy_droid_posttraining.md Outdated
Comment thread docs/action_policy_droid_posttraining.md Outdated
fwd4 and others added 4 commits June 11, 2026 06:32
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Comment thread docs/action_policy_droid_posttrain.md
fwd4 and others added 4 commits June 10, 2026 16:24
Signed-off-by: Hao Liang <haolia@nvidia.com>
…on large video batches)

Signed-off-by: Hao Liang <haolia@nvidia.com>
Ports the internal use_filter_dict from the DROID policy recipe: restrict training
windows to curated keep-ranges (drops idle/non-task frames), matching the reference
distribution (kept 74.42% = 12.54M of 16.85M windows). Off by default; enabled via
filter_dict_path (an internal data artifact, not shipped). Additive branch — the
unfiltered path is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Hao Liang <haolia@nvidia.com>
…ternal

The OSS recipe used torch AdamW (bf16 params) + eps 1e-6, which under-steps the
small 5x-lr action heads (sub-ULP updates swamped in bf16) and leaves the action
loss on a noisy high plateau. Switch to FusedAdam with fp32 master_weights + eps
1e-8 to match the internal droid_lerobot_8b_policy optimizer. An exact-match
forward+optimizer test (identical batch/seed, 1-GPU eager) confirmed the forward
is byte-identical across repos and the convergence gap was solely the optimizer.

Signed-off-by: Hao Liang <haolia@nvidia.com>
@fwd4 fwd4 removed the request for review from yuzhudong June 10, 2026 23:43
@lfengad lfengad merged commit bbda321 into NVIDIA:main Jun 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants