Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The quality of the videos generated by the huggingface model is significantly worse. #160

Open
JamesTensor opened this issue Jan 20, 2025 · 2 comments

Comments

@JamesTensor
Copy link

Thank you for the amazing world-class work!
I used two scripts, inference_hunyuan.sh and inference_hunyuan_hf.sh, and found that the generated videos are very different. The video generated by inference_hunyuan_hf.sh is of poor quality. I tested two models with the same prompt and generated two corresponding videos. Can you help me figure out why?
The inference_hunyuan.sh script and the generated video are as follows:

In.the.fierce.battle.with.his.energy.depleting.Ultraman.scared.the.monster.away.by.jamming.a.toile.mp4

num_gpus=1
export MODEL_BASE=data/hunyuan
torchrun --nnodes=1 --nproc_per_node=$num_gpus --master_port 29503
fastvideo/sample/sample_t2v_hunyuan.py
--height 720
--width 1280
--num_frames 125
--num_inference_steps 50
--guidance_scale 1
--embedded_cfg_scale 6
--flow_shift 17
--flow-reverse
--prompt /data1/llmfast/data/aoteman/aoteman_prompt.txt
--seed 1024
--output_path outputs_video/hunyuan/vae_sp/
--model_path $MODEL_BASE
--dit-weight ${MODEL_BASE}/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
--vae-sp

The inference_hunyuan_hf.sh script and the generated video are as follows:

In.the.fierce.battle.with.his.energy.depleting.Ultraman.scared.the.monster.away.by.jamming.a.toilet.plunger.into.its.mouth.mp4

num_gpus=1
torchrun --nnodes=1 --nproc_per_node=$num_gpus --master_port 29503
/data1/llmfast/fastvideo/sample/sample_t2v_hunyuan_hf.py
--model_path data/FastHunyuan-diffusers/
--prompt_path "/data1/llmfast/data/aoteman/aoteman_prompt.txt"
--num_frames 125
--height 720
--width 1280
--num_inference_steps 50
--output_path /data1/llmfast/outputs_video/hunyuan_hf/
--seed 1024 \

@BrianChen1129
Copy link
Collaborator

BrianChen1129 commented Jan 20, 2025

Could you check --flow_shift value? You should use --flow_shift 7 for original Hunyuan and --flow_shift 17 for FastHunyuan

@JamesTensor
Copy link
Author

--flow_shift 17

I have modified the inference_hunyuan_hf.sh script by adding the --flow_shift 17 parameter, but the generated video is still as bad as before. The following is the prompt for generating the video and the script. Please help me run the fasthunyuan with inference_hunyuan_hf.sh to see if you can reproduce my problem. Thank you very much for your kind help!

Prompt:
In the fierce battle, with his energy depleting, Ultraman scared the monster away by jamming a toilet plunger into its mouth.

inference_hunyuan_hf.sh:
num_gpus=1
torchrun --nnodes=1 --nproc_per_node=$num_gpus --master_port 29503
/data1/llmfast/fastvideo/sample/sample_t2v_hunyuan_hf.py
--model_path data/FastHunyuan-diffusers/
--prompt_path "/data1/llmfast/data/aoteman/aoteman_prompt.txt"
--num_frames 125
--height 720
--width 1280
--num_inference_steps 50
--output_path /data1/llmfast/outputs_video/hunyuan_hf/
--seed 1024
--flow_shift 17 \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants