Auto-Rubric as Reward

The official implementation for Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

Overview | Quick Start | Auto-Rubric Docs | Training | Acknowledgements

What This Repo Does

Auto-Rubric provides a compact implementation of Auto-Rubric as Reward for visual generation. It turns a small set of labeled visual preference examples into explicit, inspectable rubric text, then uses a frozen VLM judge conditioned on those rubrics to produce pairwise rewards for RPO.

labeled visual pairs
  -> auto-generate rubrics
  -> verify and revise criteria
  -> structure/reuse rubric text
  -> VLM judge returns ranks
  -> RPO receives pairwise rewards

This release focuses on:

Area	Included
Auto-Rubric	Generation, verification, revision, categorization, grading, reward conversion.
Text-to-image	FLUX.1-dev LoRA RPO with pairwise ARR rewards.
Image editing	Qwen-Image-Edit LoRA RPO with source-image-aware pairwise ARR rewards.
VLM judging	OpenAI-compatible local or hosted vision endpoints.

Large checkpoints, processed embeddings, and training outputs are intentionally not committed.

Key Features

Explicit reward criteria: The "reward model" is readable rubric text rather than a hidden scalar model.
Verifiable generation loop: Candidate rubrics are checked against labeled examples and revised when they fail.
Pairwise visual rewards: Rank 1 receives 1.0; rank 2 receives -0.1 for RPO.
T2I and edit support: Prompt-only FLUX and source-image-aware Qwen-Image-Edit paths are both wired.
Reusable rubric files: Generate rubrics once, inspect them, and reuse the same file for deterministic training launches.
OpenAI-compatible VLMs: Use local Qwen3-VL through vLLM or hosted GPT/Gemini-compatible endpoints.

Repository Map

Path	Purpose
`judger.py`	CLI and Python entry point for rubric generation, evaluation, and reward tensors.
`rubric_pipeline/`	Auto-Rubric prompts, VLM graders, model client, categorization, and utilities.
`fastvideo/train_rpo_flux.py`	FLUX RPO training with ARR rewards.
`fastvideo/train_rpo_qwen_edit.py`	Qwen-Image-Edit RPO training with ARR rewards.
`scripts/preprocess/`	Embedding preprocessing for FLUX and Qwen-Image-Edit.
`scripts/finetune/`	8-GPU launcher examples with paper-aligned RPO defaults.
`docs/auto_rubric/`	Detailed Auto-Rubric guide: VLM choice, rubric design, reuse, workflows, debugging.

Quick Start

Create the environment:

cd /path/to/AutoRubric-as-Reward
conda create -n autorubric-as-reward python=3.10 -y
conda activate autorubric-as-reward
bash env_setup.sh

If you already installed a different CUDA/PyTorch stack, install the repo dependencies directly:

pip install -r requirements.txt
pip install -e .

Create the expected data folder:

mkdir -p data rubric_pipeline/rubrics

Download base models as needed:

Model	Local path	Link
FLUX.1-dev	`data/flux`	https://huggingface.co/black-forest-labs/FLUX.1-dev
Qwen-Image-Edit	`data/qwenimage_edit`	https://huggingface.co/Qwen/Qwen-Image-Edit
Qwen3-VL judge	local or HF cache	https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct

Start A VLM Judge

Auto-Rubric talks to an OpenAI-compatible vision API.

Local Qwen3-VL:

MODEL_PATH=Qwen/Qwen3-VL-8B-Instruct TP_SIZE=1 PORT=8000 \
  bash rubric_pipeline/vllm_serve.sh

export OPENAI_API_KEY=EMPTY

Hosted endpoint examples:

model_name: "gpt-5"
base_url: "https://api.openai.com/v1"
api_key: "${OPENAI_API_KEY}"

model_name: "gemini-3.1-pro"
base_url: "https://generativelanguage.googleapis.com/v1beta/openai/"
api_key: "${GEMINI_API_KEY}"

More guidance: VLM Selection.

Generate And Test Rubrics

Text-to-image:

python judger.py \
  --config_path rubric_pipeline/config/qwen3vl_8B_instruct_t2i.yaml \
  --seed_dataset examples/seed_t2i_pairwise.json \
  --test_dataset examples/test_t2i_pairwise.json \
  --base_url http://localhost:8000/v1 \
  --concurrency_limit 4

Image editing:

python judger.py \
  --config_path rubric_pipeline/config/qwen3vl_8B_instruct_edit.yaml \
  --seed_dataset examples/seed_edit_pairwise.json \
  --test_dataset examples/test_edit_pairwise.json \
  --base_url http://localhost:8000/v1 \
  --concurrency_limit 4

For long runs, save the generated rubric text to rubric_pipeline/rubrics/*.txt and load it through rubrics_file. See Rubric Reuse.

Training

FLUX:

bash scripts/preprocess/preprocess_flux_rl_embeddings.sh
bash scripts/finetune/finetune_flux_rpo_8gpus.sh

Qwen-Image-Edit:

bash scripts/preprocess/preprocess_qwen_image_edit_rl_embeddings.sh
bash scripts/finetune/finetune_qwen_image_edit_rpo_8gpus.sh

The launchers use:

Task	LR	Steps	Clip	KL	LoRA
FLUX T2I	`5e-5`	`8`	`0.2`	`0.01`	rank 16
Qwen-Image-Edit	`1e-5`	`10`	`0.2`	`0.02`	rank 32

Pairwise RPO expects --use_arr, --num_generations 2, and --use_group.

Auto-Rubric Documentation

Guide	Covers
Overview	Method summary and paper-to-code map.
VLM Selection	Local vs hosted judges, JSON reliability, latency, cost.
Rubric Design	Seed pair selection, task descriptions, good/bad rubric patterns.
Rubric Reuse	Saved rubric files, versioning, validation before training.
Workflows	End-to-end commands for generation, testing, saving, and training.
Troubleshooting	Common failures and fixes.
Data Formats	JSON/JSONL layouts accepted by the vision utilities.
Training Guide	Preprocess and RPO launch details.

Acknowledgements

This code builds on and is inspired by:

Citation

@misc{tian2026autorubricrewardimplicitpreferences,
      title={Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria}, 
      author={Juanxi Tian and Fengyuan Liu and Jiaming Han and Yilei Jiang and Yongliang Wu and Yesheng Liu and Haodong Li and Furong Xu and Wanhua Li},
      year={2026},
      eprint={2605.08354},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.08354}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
__pycache__		__pycache__
assets		assets
docs		docs
examples		examples
fastvideo		fastvideo
rubric_pipeline		rubric_pipeline
scripts		scripts
LICENSE		LICENSE
README.md		README.md
env_setup.sh		env_setup.sh
index.html		index.html
judger.py		judger.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto-Rubric as Reward

What This Repo Does

Key Features

Repository Map

Quick Start

Start A VLM Judge

Generate And Test Rubrics

Training

Auto-Rubric Documentation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Auto-Rubric as Reward

What This Repo Does

Key Features

Repository Map

Quick Start

Start A VLM Judge

Generate And Test Rubrics

Training

Auto-Rubric Documentation

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages