Support reasoner video input. by foreverlms · Pull Request #25 · NVIDIA/cosmos-framework

foreverlms · 2026-06-08T10:31:12Z

No description provided.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…l-exclusion validator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pass-throughs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…te_reasoner_text stub Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ate_reasoner_text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nce engine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… to video_fps

…decord/pkl_to_media dep) The repo Qwen3VLProcessor runs do_sample_frames=False and expects a pre-decoded frame list; decode with the inference-canonical torchvision.io.read_video (no undeclared decord dep) and sample toward video_fps via Qwen smart_nframes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…/plan superseded Always emit reasoner_videos=[video_or_None] (like reasoner_images) so the batch homogeneity check aligns positionally and reliably rejects an image/video/text mix. Add superseded banners to the spec/plan docs (frame-decode + video_fps-only is final). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… untracked) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-commit) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t for reasoner_videos key - _make_reasoner_sample_args gains video_fps - text-only / with-image get_sample_data tests assert the always-present reasoner_videos:[None] - add test_get_sample_data_reasoner_with_video (monkeypatched decoder) - drop redundant lower-level _get_reasoner_sample_data duplicates (public get_sample_data set covers them) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

foreverlms and others added 17 commits June 8, 2026 03:32

Add design spec: video input for reasoner model-mode inference

985d803

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add implementation plan: video input for reasoner model-mode

2d2490f

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(reasoner): add video_* sampling fields + mutual-exclusion valida…

6ece15f

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(reasoner): video_fps PositiveFloat + construction-time mutua…

78a25a2

…l-exclusion validator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(reasoner): add video_* defaults (null) to reasoner sample_args

6859a70

feat(reasoner): video branch in prepare_multimodal_reasoner_inputs

5626909

feat(reasoner): accept video tensors in _impl_generate_reasoner_text

86a7e98

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(reasoner): forward video tensors through generate_reasoner_text …

3f41015

…pass-throughs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(reasoner): revert out-of-scope param additions to Nemotron genera…

c7d9874

…te_reasoner_text stub Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(reasoner): videos param + video chat block in OmniMoTModel.gener…

ff1d67d

…ate_reasoner_text Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(reasoner): update generate_reasoner_text docstring for video path

dbd7e86

feat(reasoner): route mp4 vision_path to video conditioning in infere…

c31261c

…nce engine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(reasoner): document video input + add reasoner_video example

769f465

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(reasoner): clarify vision_path comment covers video too

663112e

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(reasoner): decode video frames for Qwen3VLProcessor; reduce knobs…

1d957bb

… to video_fps

foreverlms force-pushed the maoshengl/video_reasoner_inference branch from 1b7b175 to 19bd716 Compare June 8, 2026 10:37

foreverlms and others added 2 commits June 8, 2026 03:38

chore(reasoner): untrack video-reasoner spec/plan docs (keep in-repo,…

92e3491

… untracked) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs(reasoner): regenerate inference.md TOC for Reasoner section (pre…

eb20347

…-commit) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

foreverlms marked this pull request as ready for review June 8, 2026 12:03

foreverlms requested review from lfengad, tylin and yy-code-nv June 8, 2026 12:03

foreverlms marked this pull request as draft June 8, 2026 12:20

foreverlms marked this pull request as ready for review June 9, 2026 14:09

Merge branch 'main' into maoshengl/video_reasoner_inference

7b71889

foreverlms mentioned this pull request Jun 11, 2026

Question: Is Reasoner expected to support MP4 video input in cosmos-framework inference? #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reasoner video input.#25

Support reasoner video input.#25
foreverlms wants to merge 21 commits into
mainfrom
maoshengl/video_reasoner_inference

foreverlms commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

foreverlms commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant