Add DROID action-policy SFT recipe (Cosmos3-Nano, joint_pos)#24
Merged
Conversation
e20fbb6 to
b8a1cad
Compare
Adds a DROID action-policy post-training recipe for nvidia/Cosmos3-Nano: - DROID LeRobot dataset: compact columnar load + episode-aware windowing, joint_pos (8D) actions + use_state proprioception. - ActionTransformPipeline wrapper (get_action_droid_sft_dataset). - Registered action_policy_droid_nano experiment + res480 repro TOML (lr 2e-4, lambdalinear, grad_clip 1.0, count-based batch, fresh action heads). - EMA warm-start in checkpoint/dcp.py (net_ema := net on warm-start init). - Docs + example launcher. Validated end-to-end on H200 (1 node/8 GPU and 2 nodes/16 ranks, HSDP). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
b8a1cad to
5a9d716
Compare
lfengad
previously approved these changes
Jun 8, 2026
Per-sample random crop+rescale (5% spatial jitter) + color jitter, applied to all camera views with shared params (temporally and cross-view consistent) before the concat-view assembly. Off by default; enabled in the action_policy_droid_nano experiment to match the reference recipe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The warm-start branch that copies net.* -> net_ema.* (so EMA starts from the loaded weights instead of random construction-time values) was gated on net_ema.* being present in the model state dict, which is always true — so it fired on every warm start and overwrote a genuinely loaded, trained net_ema (e.g. when initializing from a training checkpoint that carries its own EMA). Gate the copy on net_ema.* being explicitly listed in keys_to_skip_loading instead: only reset net_ema = net when net_ema was actually skipped on load (the HF->DCP init case). When net_ema.* is loaded, leave it intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mli0603
reviewed
Jun 9, 2026
| fps=fps, | ||
| chunk_length=chunk_length, | ||
| viewpoint=viewpoint, | ||
| action_space=action_space, |
Collaborator
There was a problem hiding this comment.
I see that the released Droid dataset doesn't accept action_space as an arg. Have you tried running this script with cosmos-framework?
Collaborator
Author
There was a problem hiding this comment.
yes, it's runnable, dataset is updated, check droid_lerobot_dataset.py
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Co-authored-by: Yu-Wei Chao <82182961+ychao-nvidia@users.noreply.github.com>
Signed-off-by: Hao Liang <haolia@nvidia.com>
…on large video batches) Signed-off-by: Hao Liang <haolia@nvidia.com>
Ports the internal use_filter_dict from the DROID policy recipe: restrict training windows to curated keep-ranges (drops idle/non-task frames), matching the reference distribution (kept 74.42% = 12.54M of 16.85M windows). Off by default; enabled via filter_dict_path (an internal data artifact, not shipped). Additive branch — the unfiltered path is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
…ternal The OSS recipe used torch AdamW (bf16 params) + eps 1e-6, which under-steps the small 5x-lr action heads (sub-ULP updates swamped in bf16) and leaves the action loss on a noisy high plateau. Switch to FusedAdam with fp32 master_weights + eps 1e-8 to match the internal droid_lerobot_8b_policy optimizer. An exact-match forward+optimizer test (identical batch/seed, 1-GPU eager) confirmed the forward is byte-identical across repos and the convergence gap was solely the optimizer. Signed-off-by: Hao Liang <haolia@nvidia.com>
ychao-nvidia
approved these changes
Jun 11, 2026
lfengad
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a DROID action-policy SFT recipe for
nvidia/Cosmos3-Nano, mirroring the internaldroid_lerobot_8bpolicy run, so users can post-train the action-generation + action heads on DROID (LeRobot v3.0) data.What's included
data/vfm/action/datasets/droid_lerobot_dataset.py— DROID LeRobot dataset: compact columnar load + episode-aware windowing (replaces an eager full-table materialization), plusjoint_pos(8D: 7 joints + gripper) anduse_statesupport.data/vfm/action/datasets/action_sft_dataset.py(new) —get_action_droid_sft_dataset(...)wrapping the dataset throughActionTransformPipeline.configs/.../action/posttrain_config/action_policy_droid_nano.py(new) — registeredaction_policy_droid_nanoexperiment (Cosmos3-Nano / 8B MoT): optimizer trains gen+action heads (5× LR on action heads),LambdaLinearschedule, count-based batch, res480,encode_exact_durations=[33](chunk 32 → 33 frames).checkpoint/dcp.py— EMA warm-start: whenkeys_to_skip_loadingexcludesnet_ema., initializenet_ema = netfrom the base weights so EMA starts from the init rather than zeros.examples/toml/sft_config/action_policy_droid_{nano,repro}.toml— 1-GPU smoke + scaled (res480) configs.examples/launch_sft_action_policy_droid.sh+docs/action_policy_droid_posttraining.md— runnable launcher and walkthrough.Validation
End-to-end on H200:
max_samples_per_batch=32(64 OOMs at 139 GiB; internal used 128 on GB200).shard 8 × replicate 2,TRAIN_EXIT=0.droid_lerobot_8b: lr 1e-4 / betas / wd, 5× action-head LR,LambdaLinear, shift{256:3,480:5,720:10},concat_view,chunk_length=32.Notes
max_samples_per_batch,max_sequence_length=None) lives in the experiment Python — TOML cannot expressnull, and the loader only overrides keys present in the TOML.nvidia/Cosmos3-Nano→ DCP and pass viaBASE_CHECKPOINT_PATH; action heads init fresh (skipped on load).