Chengyi Yang1,2, Pengzhen Li1, Jiayin Qi3, Aimin Zhou2, Ji Wu4, Ji Liu1†
1 HiThink Research 2 East China Normal University 3 Guangzhou University 4 Tsinghua University
†Corresponding Author: jiliuwork@gmail.com
### AbstractText-to-Video (T2V) generation has benefited from recent advances in diffusion models, yet current systems still struggle under complex scenarios, which are generally exacerbated by the ambiguity and underspecification of text prompts. In this work, we formulate complex-scenario prompt refinement as a stage-wise multi-agent refinement process and propose SCMAPR, i.e., a scenario-aware and Self-Correcting Multi-Agent Prompt Refinement framework for T2V prompting. SCMAPR coordinates specialized agents to (i) route each prompt to a taxonomy-grounded scenario for strategy selection, (ii) synthesize scenario-aware rewriting policies and perform policy-conditioned refinement, and (iii) conduct structured semantic verification that triggers conditional revision when violations are detected. To clarify what constitutes complex scenarios in T2V prompting, provide representative examples, and enable rigorous evaluation under such challenging conditions, we further introduce T2V-Complexity, which is a complex-scenario T2V benchmark consisting exclusively of complex-scenario prompts. Extensive experiments on 3 existing benchmarks and our T2V-Complexity benchmark demonstrate that SCMAPR consistently improves text-video alignment and overall generation quality under complex scenarios, achieving up to 2.67% and 3.28 gains in average score on VBench and EvalCrafter, and up to 0.028 improvement on T2V-CompBench over 3 State-Of-The-Art baselines.
SCMAPR organizes prompt refinement as a stage-wise multi-agent collaboration involving six specialized agents. The framework proceeds through five functional stages: (I) Scenario Routing, where Scenario Router assigns a scenario tag to the input prompt. (II) Policy Synthesis, where a Policy Generator generates a scenario-conditioned rewriting policy. (III) Policy-Conditioned Refinement, where a Prompt Refiner rewrites the prompt. (IV) Semantic Verification, where Atomizer and Validator collaboratively verify semantic fidelity through atomic extraction and entailment judgment. (V) Conditional Revision, where verification feedback conditionally triggers targeted revision, enabling self-correcting refinement.
Given a user input and the corresponding refined prompt, semantic verification is performed in four steps. (1) \emph{Atomic Extraction} decomposes the user input into atom elements. (2) \emph{Chunking} segments the refined prompt into semantically coherent evidence units. (3) \emph{Atom-Chunk Matching} retrieves the most relevant evidence chunk for each atom. (4) \emph{Entailment Validation} assesses atom-level semantic relations between atoms and evidence chunks. Through this design, semantic missing and contradictions in the refined prompt can be detected and subsequently used to trigger downstream revision.
Given a user input, the framework performs scenario routing, policy generation, policy-conditioned prompt refinement, atom-level verification, and targeted revision. Entailment Validator labels each atom-evidence pair and conditionally triggers targeted revision, producing a verified refined prompt for downstream video generation.
conda create -n SCMPR python=3.10.18
conda activate SCMPR
pip install -r requirements.txtRemember to write your API Key in utils/config.json
Our code supports running the entire pipeline end to end, as well as executing each stage step by step.
python -m refinement.classifier \
--output_dir results \
--input_txt data/vbench_full_info.txt \
--output_name category_vbench946.jsonl \
--include_non_difficult
python -m refinement.classifier \
--output_dir results \
--input_txt data/evalcrafter700.txt \
--output_name category_evalcrafter700.jsonl \
--include_non_difficult
python -m refinement.classifier \
--output_dir results \
--input_txt data/compbench1400.txt \
--output_name category_compbench1400.jsonl \
--include_non_difficult
python -m refinement.policy \
--input_jsonl results/category_vbench946.jsonl \
--output_jsonl results/policy_vbench946.jsonl \
--log_every 20
python -m refinement.policy \
--input_jsonl results/category_evalcrafter700.jsonl \
--output_jsonl results/policy_evalcrafter700.jsonl \
--log_every 20
python -m refinement.policy \
--input_jsonl results/category_compbench1400.jsonl \
--output_jsonl results/policy_compbench1400.jsonl \
--log_every 20
python -m refinement.policy \
--input_jsonl benchmark/prompts.jsonl \
--output_jsonl results/policy_t2vcomplexity1000.jsonl \
--log_every 20
python -m refinement.refiner \
--input_jsonl results/policy_vbench946.jsonl \
--output_jsonl results/refined_vbench946.jsonl \
--log_every 20
python -m refinement.refiner \
--input_jsonl results/policy_evalcrafter700.jsonl \
--output_jsonl results/refined_evalcrafter700.jsonl \
--log_every 20
python -m refinement.refiner \
--input_jsonl results/policy_compbench1400.jsonl \
--output_jsonl results/refined_compbench1400.jsonl \
--log_every 20
python -m refinement.refiner \
--input_jsonl results/policy_t2vcomplexity1000.jsonl \
--output_jsonl results/refined_t2vcomplexity1000.jsonl \
--log_every 20
python3 run_batch_flow.py \
--input data/vbench_full_info.txt \
--output_txt results/verified_vbench946.txt \
--output_jsonl results/verified_vbench946.jsonl \
--category_jsonl results/category_vbench946.jsonl \
--policy_jsonl results/policy_vbench946.jsonl \
--refined_jsonl results/refined_vbench946.jsonl \
--resume_from verify
python3 run_batch_flow.py \
--input data/evalcrafter700.txt \
--output_txt results/verified_evalcrafter700.txt \
--output_jsonl results/verified_evalcrafter700.jsonl \
--category_jsonl results/category_evalcrafter700.jsonl \
--policy_jsonl results/policy_evalcrafter700.jsonl \
--refined_jsonl results/refined_evalcrafter700.jsonl \
--resume_from verify
python3 run_batch_flow.py \
--input data/compbench1000.txt \
--output_txt results/verified_compbench1000.txt \
--output_jsonl results/verified_compbench1000.jsonl \
--category_jsonl results/category_compbench1000.jsonl \
--policy_jsonl results/policy_compbench1000.jsonl \
--refined_jsonl results/refined_compbench1000.jsonl \
--resume_from verify
python3 run_batch_flow.py \
--input data/t2v_complexity1000.txt \
--output_txt results/verified_t2vcomplexity1000.txt \
--output_jsonl results/verified_t2vcomplexity1000.jsonl \
--category_jsonl benchmark/prompts.jsonl \
--policy_jsonl results/policy_t2vcomplexity1000.jsonl \
--refined_jsonl results/refined_t2vcomplexity1000.jsonl \
--resume_from [classifier or policy refiner or verify or verify]
python3 run_batch_flow.py \
--input data/vbench_full_info.txt \
--output_txt results/verified_vbench946.txt \
--output_jsonl results/verified_vbench946.jsonl \
--category_jsonl results/category_vbench946.jsonl \
--policy_jsonl results/policy_vbench946.jsonl \
--refined_jsonl results/refined_vbench946.jsonl \
--resume_from None
python3 run_batch_flow.py \
--input data/t2v_complexity1000.txt \
--output_txt results/verified_t2vcomplexity1000.txt \
--output_jsonl results/verified_t2vcomplexity1000.jsonl \
--category_jsonl benchmark/prompts.jsonl \
--policy_jsonl results/policy_t2vcomplexity1000.jsonl \
--refined_jsonl results/refined_t2vcomplexity1000.jsonl \
--resume_from None



