The quality of the videos generated by the huggingface model is significantly worse. #160

JamesTensor · 2025-01-20T03:45:12Z

Thank you for the amazing world-class work!
I used two scripts, inference_hunyuan.sh and inference_hunyuan_hf.sh, and found that the generated videos are very different. The video generated by inference_hunyuan_hf.sh is of poor quality. I tested two models with the same prompt and generated two corresponding videos. Can you help me figure out why?
The inference_hunyuan.sh script and the generated video are as follows:

In.the.fierce.battle.with.his.energy.depleting.Ultraman.scared.the.monster.away.by.jamming.a.toile.mp4

num_gpus=1
export MODEL_BASE=data/hunyuan
torchrun --nnodes=1 --nproc_per_node=$num_gpus --master_port 29503
fastvideo/sample/sample_t2v_hunyuan.py
--height 720
--width 1280
--num_frames 125
--num_inference_steps 50
--guidance_scale 1
--embedded_cfg_scale 6
--flow_shift 17
--flow-reverse
--prompt /data1/llmfast/data/aoteman/aoteman_prompt.txt
--seed 1024
--output_path outputs_video/hunyuan/vae_sp/
--model_path $MODEL_BASE
--dit-weight ${MODEL_BASE}/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
--vae-sp

The inference_hunyuan_hf.sh script and the generated video are as follows:

In.the.fierce.battle.with.his.energy.depleting.Ultraman.scared.the.monster.away.by.jamming.a.toilet.plunger.into.its.mouth.mp4

num_gpus=1
torchrun --nnodes=1 --nproc_per_node=$num_gpus --master_port 29503
/data1/llmfast/fastvideo/sample/sample_t2v_hunyuan_hf.py
--model_path data/FastHunyuan-diffusers/
--prompt_path "/data1/llmfast/data/aoteman/aoteman_prompt.txt"
--num_frames 125
--height 720
--width 1280
--num_inference_steps 50
--output_path /data1/llmfast/outputs_video/hunyuan_hf/
--seed 1024 \

BrianChen1129 · 2025-01-20T18:23:58Z

Could you check --flow_shift value? You should use --flow_shift 7 for original Hunyuan and --flow_shift 17 for FastHunyuan

JamesTensor · 2025-01-22T05:23:12Z

--flow_shift 17

I have modified the inference_hunyuan_hf.sh script by adding the --flow_shift 17 parameter, but the generated video is still as bad as before. The following is the prompt for generating the video and the script. Please help me run the fasthunyuan with inference_hunyuan_hf.sh to see if you can reproduce my problem. Thank you very much for your kind help!

Prompt:
In the fierce battle, with his energy depleting, Ultraman scared the monster away by jamming a toilet plunger into its mouth.

inference_hunyuan_hf.sh:
num_gpus=1
torchrun --nnodes=1 --nproc_per_node=$num_gpus --master_port 29503
/data1/llmfast/fastvideo/sample/sample_t2v_hunyuan_hf.py
--model_path data/FastHunyuan-diffusers/
--prompt_path "/data1/llmfast/data/aoteman/aoteman_prompt.txt"
--num_frames 125
--height 720
--width 1280
--num_inference_steps 50
--output_path /data1/llmfast/outputs_video/hunyuan_hf/
--seed 1024
--flow_shift 17 \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The quality of the videos generated by the huggingface model is significantly worse. #160

The quality of the videos generated by the huggingface model is significantly worse. #160

JamesTensor commented Jan 20, 2025

BrianChen1129 commented Jan 20, 2025 •

edited

Loading

JamesTensor commented Jan 22, 2025

The quality of the videos generated by the huggingface model is significantly worse. #160

The quality of the videos generated by the huggingface model is significantly worse. #160

Comments

JamesTensor commented Jan 20, 2025

BrianChen1129 commented Jan 20, 2025 • edited Loading

JamesTensor commented Jan 22, 2025

BrianChen1129 commented Jan 20, 2025 •

edited

Loading