Skip to content

HiThink-Research/SCMAPR

Repository files navigation

SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation

   

Chengyi Yang1,2, Pengzhen Li1, Jiayin Qi3, Aimin Zhou2, Ji Wu4, Ji Liu1†

1 HiThink Research     2 East China Normal University     3 Guangzhou University     4 Tsinghua University

Corresponding Author: jiliuwork@gmail.com

### Abstract

Text-to-Video (T2V) generation has benefited from recent advances in diffusion models, yet current systems still struggle under complex scenarios, which are generally exacerbated by the ambiguity and underspecification of text prompts. In this work, we formulate complex-scenario prompt refinement as a stage-wise multi-agent refinement process and propose SCMAPR, i.e., a scenario-aware and Self-Correcting Multi-Agent Prompt Refinement framework for T2V prompting. SCMAPR coordinates specialized agents to (i) route each prompt to a taxonomy-grounded scenario for strategy selection, (ii) synthesize scenario-aware rewriting policies and perform policy-conditioned refinement, and (iii) conduct structured semantic verification that triggers conditional revision when violations are detected. To clarify what constitutes complex scenarios in T2V prompting, provide representative examples, and enable rigorous evaluation under such challenging conditions, we further introduce T2V-Complexity, which is a complex-scenario T2V benchmark consisting exclusively of complex-scenario prompts. Extensive experiments on 3 existing benchmarks and our T2V-Complexity benchmark demonstrate that SCMAPR consistently improves text-video alignment and overall generation quality under complex scenarios, achieving up to 2.67% and 3.28 gains in average score on VBench and EvalCrafter, and up to 0.028 improvement on T2V-CompBench over 3 State-Of-The-Art baselines.

Framework

Self-Correcting Multi-Agent Prompt Refinement Framework (SCMAPR)

SCMAPR organizes prompt refinement as a stage-wise multi-agent collaboration involving six specialized agents. The framework proceeds through five functional stages: (I) Scenario Routing, where Scenario Router assigns a scenario tag to the input prompt. (II) Policy Synthesis, where a Policy Generator generates a scenario-conditioned rewriting policy. (III) Policy-Conditioned Refinement, where a Prompt Refiner rewrites the prompt. (IV) Semantic Verification, where Atomizer and Validator collaboratively verify semantic fidelity through atomic extraction and entailment judgment. (V) Conditional Revision, where verification feedback conditionally triggers targeted revision, enabling self-correcting refinement.

pipeline

Illustration of the Semantic Verification Stage in SCMAPR

Given a user input and the corresponding refined prompt, semantic verification is performed in four steps. (1) \emph{Atomic Extraction} decomposes the user input into atom elements. (2) \emph{Chunking} segments the refined prompt into semantically coherent evidence units. (3) \emph{Atom-Chunk Matching} retrieves the most relevant evidence chunk for each atom. (4) \emph{Entailment Validation} assesses atom-level semantic relations between atoms and evidence chunks. Through this design, semantic missing and contradictions in the refined prompt can be detected and subsequently used to trigger downstream revision.

verification_steps

Results

Comparison of Complex-Scenario Text-to-Video Generation Before and After Prompt Refinement

example_scene1n4

End-to-End case study of SCMAPR with Self-Correction

Given a user input, the framework performs scenario routing, policy generation, policy-conditioned prompt refinement, atom-level verification, and targeted revision. Entailment Validator labels each atom-evidence pair and conditionally triggers targeted revision, producing a verified refined prompt for downstream video generation.

case_study_1

Installation

conda create -n SCMPR python=3.10.18
conda activate SCMPR
pip install -r requirements.txt

Run SCMAPR

Remember to write your API Key in utils/config.json

Our code supports running the entire pipeline end to end, as well as executing each stage step by step.

Scenario Routing

VBench
python -m refinement.classifier \
  --output_dir results \
  --input_txt data/vbench_full_info.txt \
  --output_name category_vbench946.jsonl \
  --include_non_difficult
EvalCrafer
python -m refinement.classifier \
  --output_dir results \
  --input_txt data/evalcrafter700.txt \
  --output_name category_evalcrafter700.jsonl \
  --include_non_difficult
CompBench
python -m refinement.classifier \
  --output_dir results \
  --input_txt data/compbench1400.txt \
  --output_name category_compbench1400.jsonl \
  --include_non_difficult

Policy Generation

VBench
python -m refinement.policy \
  --input_jsonl results/category_vbench946.jsonl \
  --output_jsonl results/policy_vbench946.jsonl \
  --log_every 20
EvalCrafter
python -m refinement.policy \
  --input_jsonl results/category_evalcrafter700.jsonl \
  --output_jsonl results/policy_evalcrafter700.jsonl \
  --log_every 20
CompBench
python -m refinement.policy \
  --input_jsonl results/category_compbench1400.jsonl \
  --output_jsonl results/policy_compbench1400.jsonl \
  --log_every 20
T2V-Complexity
python -m refinement.policy \
  --input_jsonl benchmark/prompts.jsonl \
  --output_jsonl results/policy_t2vcomplexity1000.jsonl \
  --log_every 20

Prompt Refinement

Vbench
python -m refinement.refiner \
  --input_jsonl results/policy_vbench946.jsonl \
  --output_jsonl results/refined_vbench946.jsonl \
  --log_every 20
EvalCrafter
python -m refinement.refiner \
  --input_jsonl results/policy_evalcrafter700.jsonl \
  --output_jsonl results/refined_evalcrafter700.jsonl \
  --log_every 20
CompBench
python -m refinement.refiner \
  --input_jsonl results/policy_compbench1400.jsonl \
  --output_jsonl results/refined_compbench1400.jsonl \
  --log_every 20
T2V-Compleixty
python -m refinement.refiner \
  --input_jsonl results/policy_t2vcomplexity1000.jsonl \
  --output_jsonl results/refined_t2vcomplexity1000.jsonl \
  --log_every 20

Verification and Revision

VBench
python3 run_batch_flow.py \
    --input data/vbench_full_info.txt \
    --output_txt results/verified_vbench946.txt \
    --output_jsonl results/verified_vbench946.jsonl \
    --category_jsonl results/category_vbench946.jsonl \
    --policy_jsonl results/policy_vbench946.jsonl \
    --refined_jsonl results/refined_vbench946.jsonl \
    --resume_from verify
EvalCrafter
python3 run_batch_flow.py \
    --input data/evalcrafter700.txt \
    --output_txt results/verified_evalcrafter700.txt \
    --output_jsonl results/verified_evalcrafter700.jsonl \
    --category_jsonl results/category_evalcrafter700.jsonl \
    --policy_jsonl results/policy_evalcrafter700.jsonl \
    --refined_jsonl results/refined_evalcrafter700.jsonl \
    --resume_from verify
CompBench
python3 run_batch_flow.py \
    --input data/compbench1000.txt \
    --output_txt results/verified_compbench1000.txt \
    --output_jsonl results/verified_compbench1000.jsonl \
    --category_jsonl results/category_compbench1000.jsonl \
    --policy_jsonl results/policy_compbench1000.jsonl \
    --refined_jsonl results/refined_compbench1000.jsonl \
    --resume_from verify
T2v-Compleixty
python3 run_batch_flow.py \
    --input data/t2v_complexity1000.txt \
    --output_txt results/verified_t2vcomplexity1000.txt \
    --output_jsonl results/verified_t2vcomplexity1000.jsonl \
    --category_jsonl benchmark/prompts.jsonl \
    --policy_jsonl results/policy_t2vcomplexity1000.jsonl \
    --refined_jsonl results/refined_t2vcomplexity1000.jsonl \
    --resume_from [classifier or policy refiner or verify or verify]

Run the whole framework

python3 run_batch_flow.py \
    --input data/vbench_full_info.txt \
    --output_txt results/verified_vbench946.txt \
    --output_jsonl results/verified_vbench946.jsonl \
    --category_jsonl results/category_vbench946.jsonl \
    --policy_jsonl results/policy_vbench946.jsonl \
    --refined_jsonl results/refined_vbench946.jsonl \
    --resume_from None
python3 run_batch_flow.py \
    --input data/t2v_complexity1000.txt \
    --output_txt results/verified_t2vcomplexity1000.txt \
    --output_jsonl results/verified_t2vcomplexity1000.jsonl \
    --category_jsonl benchmark/prompts.jsonl \
    --policy_jsonl results/policy_t2vcomplexity1000.jsonl \
    --refined_jsonl results/refined_t2vcomplexity1000.jsonl \
    --resume_from None

About

Prompt optimization for T2V

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors