This repository contains the implementation of the following paper and its related serial works in progress. We evaluate video generative models!
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang∗,Yinan He∗,Jiashuo Yu∗,Fan Zhang∗,Chenyang Si,Yuming Jiang,Yuanhan Zhang,Tian xing Wu,Qingyang Jin,Nattapol Chanpaisit,Yaohui Wang,Xinyuan Chen,Limin Wang,Dahua Lin+,Yu Qiao+,Ziwei Liu+
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
- Updates
- Overview
- Evaluation Results
- Video Generation Models Info
- Installation
- Usage
- Prompt Suite
- Sampled Videos
- Evaluation Method Suite
- Citation and Acknowledgement
-
[09/2024]VBench-Long Leaderboardavailable: Our VBench-Long leaderboard now has 10 long video generation models. VBench leaderboard now has 40 text-to-video (both long and short) models. All video generative models are encouraged to participate!
-
[09/2024]PyPI Updates: PyPI package is updated to version0.1.4:bug fixes and multi-gpu inference.
-
[08/2024]Longer and More Descriptive Prompts:Available Here!We followCogVideoX's prompt optimization technique to enhance VBench prompts using GPT-4o, making them longer and more descriptive without altering their original meaning.
-
[08/2024]VBench Leaderboardupdate: Our leaderboard has 28T2V models,12I2V modelsso far. All video generative models are encouraged to participate!
-
[06/2024] 🔥VBench-Long🔥 is ready to use for evaluating longer Sora-like videos!
-
[06/2024]Model Info Documentation:Information on video generative models in ourVBench Leaderboard is documentedHERE.
-
[05/2024]PyPI Update:PyPI package
vbench
is updated to version 0.1.2. This includes changes in the preprocessing for high-resolution images/videos forimaging_quality
,support for evaluating customized videos, and minor bug fixes. -
[04/2024] We release all the videos we sampled and used for VBench evaluation.See detailshere.
-
[03/2024] 🔥VBench-Trustworthiness🔥 We now support evaluating thetrustworthiness(e.g.,culture, fairness, bias, safety) of video generative models.
-
[03/2024] 🔥VBench-I2V🔥 We now support evaluatingImage-to-Video (I2V)models. We also provideImage Suite.
-
[03/2024] We supportevaluating customized videos!Seeherefor instructions.
-
[01/2024] PyPI package is released!.Simply
pip install vbench
. -
[12/2023] 🔥VBench🔥 Evaluation code released for 16Text-to-Video (T2V) evaluationdimensions.
['subject_consistency', 'background_consistency', 'temporal_flickering', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality', 'object_class', 'multiple_objects', 'human_action', 'color', 'spatial_relationship', 'scene', 'temporal_style', 'appearance_style', 'overall_consistency']
-
[11/2023] Prompt Suites released. (See prompt listshere)
We proposeVBench,a comprehensive benchmark suite for video generative models. We design a comprehensive and hierarchicalEvaluation Dimension Suiteto decompose "video generation quality" into multiple well-defined dimensions to facilitate fine-grained and objective evaluation. For each dimension and each content category, we carefully design aPrompt Suiteas test cases, and sampleGenerated Videosfrom a set of video generation models. For each evaluation dimension, we specifically design anEvaluation Method Suite,which uses carefully crafted method or designated pipeline for automatic objective evaluation. We also conductHuman Preference Annotationfor the generated videos for each dimension, and show that VBench evaluation results arewell aligned with human perceptions.VBench can provide valuable insights from multiple perspectives.
See our leaderboard for the most updated ranking and numerical results (with models like Gen-3, Kling, Pika).
We visualize VBench evaluation results of various publicly available video generation models, as well as Gen-2 and Pika, across 16 VBench dimensions. We normalize the results per dimension for clearer comparisons.
See numeric values at ourLeaderboard🥇🥈🥉
How to join VBench Leaderboard? See the 3 options below:
Sampling Party | Evaluation Party | Comments |
---|---|---|
VBench Team | VBench Team | We periodically allocate resources to sample newly released models and perform evaluations. You can request us to perform sampling and evaluation, but the progress depends on our available resources. |
Your Team | VBench Team | For non-open-source models interested in joining our leaderboard, submit your video samples to us for evaluation. If you prefer to provide the evaluation results directly, see the row below. |
Your Team | Your Team | If you have already used VBench for full evaluation in your report/paper, submit youreval_results.zip files to theVBench Leaderboardusing theSubmit here! form. The evaluation results will be automatically updated to the leaderboard. Also, share your model information for our records for any columnshere. |
Seemodel infofor video generation models we used for evaluation.
pip install vbench
To evaluate some video generation ability aspects, you need to installdetectron2via:
pip install detectron2@git+https://github /facebookresearch/detectron2.git
If there is an error duringdetectron2installation, seehere.
DownloadVBench_full_info.jsonto your running directory to read the benchmark prompt suites.
git clone https://github /Vchitect/VBench.git
pip install -r VBench/requirements.txt
pip install VBench
If there is an error duringdetectron2installation, seehere.
Use VBench to evaluate videos, and video generative models.
- A Side Note: VBench is designed for evaluating different models on a standard benchmark. Therefore, by default, we enforce evaluation on thestandard VBench prompt liststo ensurefair comparisonsamong different video generation models. That's also why we give warnings when a required video is not found. This is done via defining the set of prompts inVBench_full_info.json.However, we understand that many users would like to use VBench to evaluate their own videos, or videos generated from prompts that does not belong to the VBench Prompt Suite, so we also added the function ofEvaluating Your Own Videos.Simply set
mode=custom_input
,and you can evaluate your own videos.
We support evaluating any video. Simply provide the path to the video file, or the path to the folder that contains your videos. There is no requirement on the videos' names.
- Note: We support customized videos / prompts for the following dimensions:
'subject_consistency', 'background_consistency', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality'
To evaluate videos with customized input prompt, run our script with--mode=custom_input
:
Python evaluate.py \
--dimension $DIMENSION \
--videos_path /path/to/folder_or_video/ \
--mode=custom_input
alternatively you can use our command:
vbench evaluate \
--dimension $DIMENSION \
--videos_path /path/to/folder_or_video/ \
--mode=custom_input
To evaluate using multiple gpus, we can use the following commands:
torchrun --nproc_per_node=${GPUS} --standalone evaluate.py...args...
or
vbench evaluate --ngpus=${GPUS}...args...
vbench evaluate --videos_path$VIDEO_PATH--dimension$DIMENSION
For example:
vbench evaluate --videos_path"sampled_videos/lavie/human_action"--dimension"human_action"
fromvbenchimportVBench
my_VBench=VBench(device,<path/to/VBench_full_info.json>,<path/to/save/dir>)
my_VBench.evaluate(
videos_path=<video_path>,
name=<name>,
dimension_list=[<dimension>,<dimension>,...],
)
For example:
fromvbenchimportVBench
my_VBench=VBench(device,"vbench/VBench_full_info.json","evaluation_results")
my_VBench.evaluate(
videos_path="sampled_videos/lavie/human_action",
name="lavie_human_action",
dimension_list=["human_action"],
)
vbench evaluate \
--videos_path$VIDEO_PATH\
--dimension$DIMENSION\
--mode=vbench_category \
--category=$CATEGORY
or
Python evaluate.py \
--dimension $DIMENSION \
--videos_path /path/to/folder_or_video/ \
--mode=vbench_category
We have provided scripts to download VideoCrafter-1.0 samples, and the corresponding evaluation scripts.
# download sampled videos
sh scripts/download_videocrafter1.sh
# evaluate VideoCrafter-1.0
sh scripts/evaluate_videocrafter1.sh
We have provided scripts for calculating theFinal Score
,Quality Score
,andSemantic Score
in the Leaderboard. You can run them locally to obtain the final scores or as a final check before submitting to the Leaderboard.
#Pack the evaluation results into a zip file.
cdevaluation_results
zip -r../evaluation_results.zip.
#[Optional] get the final score of your submission file.
Python scripts/cal_final_score.py --zip_file {path_to_evaluation_results.zip} --model_name {your_model_name}
You can submit the json file toHuggingFace
[Optional] Please download the pre-trained weights according to the guidance in themodel_path.txt
file for each model in thepretrained
folder to~/.cache/vbench
.
We provide prompt lists are atprompts/
.
Check outdetails of prompt suites,and instructions forhow to sample videos for evaluation.
To facilitate future research and to ensure full transparency, we release all the videos we sampled and used for VBench evaluation. You can download them onGoogle Drive.
See detailed explanations of the sampled videoshere.
We also provide detailed setting for the models under evaluationhere.
To perform evaluation on one dimension, run this:
Python evaluate.py --videos_path $VIDEOS_PATH --dimension $DIMENSION
- The complete list of dimensions:
['subject_consistency', 'background_consistency', 'temporal_flickering', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality', 'object_class', 'multiple_objects', 'human_action', 'color', 'spatial_relationship', 'scene', 'temporal_style', 'appearance_style', 'overall_consistency']
Alternatively, you can evaluate multiple models and multiple dimensions using this script:
bash evaluate.sh
- The default sampled video paths:
vbench_videos/{model}/{dimension}/{prompt}-{index}.mp4/gif
Before evaluating the temporal flickering dimension, it is necessary to filter out the static videos first.
To filter static videos in the temporal flickering dimension, run this:
# This only filter out static videos whose prompt matches the prompt in the temporal_flickering.
Python static_filter.py --videos_path $VIDEOS_PATH
You can adjust the filtering scope by:
# 1. Change the filtering scope to consider all files inside videos_path for filtering.
Python static_filter.py --videos_path $VIDEOS_PATH --filter_scope all
# 2. Specify the path to a JSON file ($filename) to consider only videos whose prompts match those listed in $filename.
Python static_filter.py --videos_path $VIDEOS_PATH --filter_scope $filename
If you find our repo useful for your research, please consider citing our paper:
@InProceedings{huang2023vbench,
title={{VBench}: Comprehensive Benchmark Suite for Video Generative Models},
author={Huang, Ziqi and He, Yinan and Yu, Jiashuo and Zhang, Fan and Si, Chenyang and Jiang, Yuming and Zhang, Yuanhan and Wu, Tian xing and Jin, Qingyang and Chanpaisit, Nattapol and Wang, Yaohui and Chen, Xinyuan and Wang, Limin and Lin, Dahua and Qiao, Yu and Liu, Ziwei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Order is based on the time joining the project:
Ziqi Huang,Yinan He,Jiashuo Yu,Fan Zhang,Nattapol Chanpaisit,Xiaojie Xu,Qianli Ma,Ziyue Dong.
This project wouldn't be possible without the following open-sourced repositories: AMT,UMT,RAM,CLIP,RAFT,GRiT,IQA-PyTorch,ViCLIP,andLAION Aesthetic Predictor.
We are putting togetherAwesome-Evaluation-of-Visual-Generation,which collects works for evaluating visual generation.