Rethinking Diffusion for Text-Driven Human Motion Generation
Zichong Mengβ
Yiming Xieβ
Xiaogang Pengβ
Zeyu Hanβ
Huaizu Jiangβ
Northeastern University
arXiv 2024
- Release the clean codes for implementation.
- Release the evaluation codes and the pretrained models.
- Release the simple and minimalist version of codes for implementation.
- Release updated version of AE weights and scripts
- Will be releasing the updated version of AE weights with more support and scripts soon after cleaning the code.
conda env create -f environment.yml
conda activate MARDM
We test our code on Python 3.10.13, PyTorch 2.2.0, and CUDA 12.1
rm -rf checkpoints
mkdir checkpoints
cd checkpoints
mkdir t2m
mkdir kit
cd t2m
echo -e "Downloading evaluation models for HumanML3D dataset"
gdown --fuzzy https://drive.google.com/file/d/1ejiz4NvyuoTj3BIdfNrTFFZBZ-zq4oKD/view?usp=sharing
echo -e "Unzipping humanml3d evaluators"
unzip evaluators_humanml3d.zip
echo -e "Cleaning humanml3d evaluators zip"
rm evaluators_humanml3d.zip
cd ../kit/
echo -e "Downloading pretrained models for KIT-ML dataset"
gdown --fuzzy https://drive.google.com/file/d/1kobWYZdWRyfTfBj5YR_XYopg9YZLdfYh/view?usp=sharing
echo -e "Unzipping kit evaluators"
unzip evaluators_kit.zip
echo -e "Cleaning kit evaluators zip"
rm evaluators_kit.zip
cd ../../
rm -rf glove
echo -e "Downloading glove (in use only by the evaluators)"
gdown --fuzzy https://drive.google.com/file/d/1cmXKUT31pqd7_XpJAiWEo1K81TMYHA5n/view?usp=sharing
unzip glove.zip
echo -e "Cleaning GloVe zip\n"
rm glove.zip
echo -e "Downloading done!"
cd checkpoints/t2m
echo -e "Downloading pretrained models for HumanML3D dataset"
gdown --fuzzy https://drive.google.com/file/d/1TBybFByAd-kD4AuFgMyR3ZBt4VV43Sif/view?usp=sharing
gdown --fuzzy https://drive.google.com/file/d/1csjlxi0uOhfPPEwiThsR0gaj7_VDmgb6/view?usp=sharing
gdown --fuzzy https://drive.google.com/file/d/1nWoEcN4rEFKi4Xyf_ObKinDmSQNPKXgU/view?usp=sharing
gdown --fuzzy https://drive.google.com/file/d/1nfX_j8VzMmynqKv8x68pXrsL3c0qWLXA/view?usp=sharing
echo -e "Unzipping"
unzip MARDM_SiT_XL.zip
unzip MARDM_DDPM_XL.zip
unzip length_estimator.zip
unzip AE_humanml3d.zip
echo -e "Cleaning zips"
rm MARDM_SiT_XL.zip
rm MARDM_DDPM_XL.zip
rm length_estimator.zip
rm AE_humanml3d.zip
cd ../../
You do not need to get data if you only want to generate motions based on textual instructions.
If you want to reproduce and evaluate our method, you can obtain both
HumanML3D and KIT following instructions in HumanML3D. By default, the data path is set to ./datasets
.
For dataset Mean and Std, you are welcome to use the eval_mean,npy and eval_std,npy in the utils, or you can calculate based on your obtained dataset using:
python utils/cal_mean_std.py
python sample.py --name MARDM_SiT_XL --text_prompt "A person is running on a treadmill."
in a txt file, in each line, your input should be <text description>#<motion length>
,
you can push NA as motion length to let model determine the motion length
(if there is one NA in file, all the others will be NA as well).
python sample.py --name MARDM_SiT_XL --text_path ./text_prompt.txt
python train_AE.py --name AE --dataset_name t2m --batch_size 256 --epoch 50 --lr_decay 0.05
# MARDM SiT-based (best results)
python train_MARDM.py --name MARDM_SiT_XL --model "MARDM-SiT-XL" --dataset_name t2m --batch_size 64 --ae_name AE
# MARDM DDPM-based
python train_MARDM.py --name MARDM_DDPM_XL --model "MARDM-DDPM-XL" --dataset_name t2m --batch_size 64 --ae_name AE
python train_AE.py --name AE --dataset_name kit --batch_size 512 --epoch 50 --lr_decay 0.1
# MARDM SiT-based (best results)
python train_MARDM.py --name MARDM_SiT_XL --model "MARDM-SiT-XL" --dataset_name kit --batch_size 16 --ae_name AE --milestones 20000
# MARDM DDPM-based
python train_MARDM.py --name MARDM_DDPM_XL --model "MARDM-DDPM-XL" --dataset_name kit --batch_size 16 --ae_name AE --milestones 20000
python evaluation_AE.py --name AE --dataset_name t2m
# MARDM SiT-based (best results)
python evaluation_MARDM.py --name MARDM_SiT_XL --model "MARDM-SiT-XL" --dataset_name t2m --cfg 4.5
# MARDM DDPM-based
python evaluation_MARDM.py --name MARDM_DDPM_XL --model "MARDM-DDPM-XL" --dataset_name t2m --cfg 4.5
python evaluation_AE.py --name AE --dataset_name kit
# MARDM SiT-based (best results)
python evaluation_MARDM.py --name MARDM_SiT_XL --model "MARDM-SiT-XL" --dataset_name kit --cfg 2.5
# MARDM DDPM-based
python evaluation_MARDM.py --name MARDM_DDPM_XL --model "MARDM-DDPM-XL" --dataset_name kit --cfg 2.5
python edit.py --name MARDM_SiT_XL -msec 0.3,0.6 --text_prompt "A man dancing around." --source_motion 000612.npy
This code is standing on the shoulders of giants, we would like to thank the following contributors that our code is based on:.
Our original raw implementation is heavily based on T2M, T2M-GPT, MMM and MoMask. The Diffusion part is primarily based on DDPM, DiT, SiT, MAR, HOI-Diff, InterGen, MDM, MLD.
For open sourced version, we decide to restructure (and some rewrite) for a simple and minimalist version of PyTorch code implementation that get rids of PyTorch Lighting implicit hooks, outer-space variable utilization and implicit argparse calls. We hope our minimalist version implementation can lead to better code comprehension and contribution to the motion generation community. Thank you.
If you find this repository useful for your work, please consider citing it as follows:
@article{meng2024rethinking,
title={Rethinking Diffusion for Text-Driven Human Motion Generation},
author={Meng, Zichong and Xie, Yiming and Peng, Xiaogang and Han, Zeyu and Jiang, Huaizu},
journal={arXiv preprint arXiv:2411.16575},
year={2024}
}