3D Human Motion Generation aims to generate natural and plausible motions from conditions such as text descriptions, action labels, music, etc.
This repository is built mainly to track mainstream Text-to-Motion works, and also contains papers and datasets related to it.
Last updated: 2024/07/24 (Partial ECCV'24 added)
- Humanml3D |
[project]
|[paper]
- KIT-ML |
[project]
- PoseScript |
[project]
|[paper]
- Motion-X |
[project]
|[paper]
- CombatMotion |
[project]
-
Frechet Inception Distance (FID)
$\downarrow$ - FID is adopted as a principal metric to evaluate the feature distributions between the generated and real motions. The feature extractor employed is from [T2M].
-
MultiModality (MModality)
$\uparrow$ - MModality measures the generation diversity conditioned on the same text. Specifically, MModality represents the average variance for a single text prompt by computing Euclidean distances of 10 generated pairs of motions.
-
Diversity
$\rightarrow$ i.e., closer to real motion is better- Diversity measures the variability and richness of the generated action sequences, which is calculated by averaging Euclidean distances of random samples from 300 pairs of motion.
-
R-Precision
$\uparrow$ - R-Precision measures the similarity between the text description and the generated motion sequence and indicates the probability that the real text appears in the top-k after sorting.
-
Multi-Modal Distance (MM Dist)
$\downarrow$ - MM Dist represents the average Euclidean distance between the motion feature of each generated motion and the text feature of its corresponding text description in the test set.
Notably! The symbol of 'o-' and 'u-' in Code Link indicate the official and the unofficial implementations, respectively.
- Please note,
[Seq2Seq]
,[Language2Pose]
,[Text2Gesture]
,[Hier]
and[TEMOS]
don't report results in terms of above metrics.- The
[Seq2Seq]
,[Language2Pose]
,[Text2Gesture]
and[Hier]
's results come from[TM2T]
. - The
[TEMOS]
's come from[MMM]
.
- The
- †: denotes a different evaluator is used.
ID | Year | Venue | Model (or Authors) |
R Precision Top-1 ↑ |
R Precision Top-2 ↑ |
R Preciion Top-3 ↑ |
FID ↓ |
MM Dist ↓ |
MultiModality ↑ |
Diversity → |
code |
- |
---|---|---|---|---|---|---|---|---|---|---|---|---|
- | - | - | Real Motion | - | - | - | ||||||
- | - | - | Real Motion † | - | - | - | ||||||
1 | 2018 | NeurIPS | Seq2Seq | - | [u-pytorch] |
- | ||||||
2 | 2019 | 3DV | Language2Pose | - | [o-pytorch] |
- | ||||||
3 | 2021 | IEEE VR | Text2Gesture | - | [o-pytorch] |
- | ||||||
4 | 2021 | ICCV | Hier | - | [o-pytorch] |
- | ||||||
5 | 2022 | ECCV | TEMOS | [o-pytorch] |
- | |||||||
6 | 2022 | ECCV | TM2T | [o-pytorch] |
- | |||||||
7 | 2022 | CVPR | T2M | [o-pytorch] |
- | |||||||
8 | 2023 | ICLR | MDM | [o-pytorch] |
- | |||||||
9 | 2022 (2024) | Arxiv (TPAMI) | MOtionDiffuse | [o-pytorch] |
- | |||||||
10 | 2023 | CVPR | MLD | [o-pytorch] |
- | |||||||
11 | 2023 | CVPR | T2M-GPT | [o-pytorch] |
- | |||||||
12 | 2023 | ICCV | Fg-T2M | - | - | |||||||
13 | 2023 | ICCV | M2DM | - | - | |||||||
14 | 2023 | ICCV | AttT2M | [o-pytorch] |
- | |||||||
15 | 2023 | NeurIPS | MotionGPT | [o-pytorch] |
- | |||||||
16 | 2023 | NeurIPS | ReMoDiffuse † | [o-pytorch] |
- | |||||||
17 | 2024 | CVPR | MMM | [o-pytorch] |
- | |||||||
18 | 2024 | CVPR | MoMask | - | [o-pytorch] |
- | ||||||
19 | 2024 | ECCV | MotionLCM | [o-pytorch] |
- | |||||||
20 | 2024 | ECCV | Motion Mamba | [o-pytorch] |
- | |||||||
21 | 2024 | ECCV | BAMM | - | - |
ID | Year | Venue | Model (or Authors) |
R Precision Top-1 ↑ |
R Precision Top-2 ↑ |
R Preciion Top-3 ↑ |
FID ↓ |
MM Dist ↓ |
MultiModality ↑ |
Diversity → |
code |
- |
---|---|---|---|---|---|---|---|---|---|---|---|---|
- | - | - | Real Motion (GT) | - | - | - | ||||||
- | - | - | Real Motion † | - | - | - | ||||||
1 | 2018 | NeurIPS | Seq2Seq | - | [u-pytorch] |
- | ||||||
2 | 2019 | 3DV | Language2Pose | - | [o-pytorch] |
- | ||||||
3 | 2021 | IEEE VR | Text2Gesture | - | [o-pytorch] |
- | ||||||
4 | 2021 | ICCV | Hier | - | [o-pytorch] |
- | ||||||
5 | 2022 | ECCV | TEMOS | [o-pytorch] |
- | |||||||
6 | 2022 | ECCV | TM2T | [o-pytorch] |
- | |||||||
7 | 2022 | CVPR | T2M | [o-pytorch] |
- | |||||||
8 | 2023 | ICLR | MDM | [o-pytorch] |
- | |||||||
9 | 2022 (2024) | Arxiv (TPAMI) | MOtionDiffuse | [o-pytorch] |
- | |||||||
10 | 2023 | CVPR | MLD | [o-pytorch] |
- | |||||||
11 | 2023 | CVPR | T2M-GPT | [o-pytorch] |
- | |||||||
12 | 2023 | ICCV | Fg-T2M | - | - | |||||||
13 | 2023 | ICCV | M2DM | - | - | |||||||
14 | 2023 | ICCV | AttT2M | [o-pytorch] |
- | |||||||
15 | 2023 | NeurIPS | MotionGPT | [o-pytorch] |
- | |||||||
16 | 2023 | NeurIPS | ReMoDiffuse † | [o-pytorch] |
- | |||||||
17 | 2024 | CVPR | MMM | [o-pytorch] |
- | |||||||
18 | 2024 | CVPR | MoMask | - | [o-pytorch] |
- | ||||||
19 | 2024 | ECCV | Motion Mamba | [o-pytorch] |
- | |||||||
20 | 2024 | ECCV | BAMM | - | - |
- [Seq2Seq] | NeurIPS'18 | Generating Animated Videos of Human Activities from Natural Language Descriptions |
[pdf]
|[u-pytorch]
| - [Language2Pose] | 3DV'19 | Language2Pose: Natural Language Grounded Pose Forecasting |
[pdf]
|[o-pytorch]
| - [Text2Gesture] | IEEE VR'21 | Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents |
[pdf]
|[o-pytorch]
| - [Hier] | ICCV'21 | Synthesis of Compositional Animations from Textual Descriptions |
[pdf]
|[o-pytorch]
| - [TEMOS] | ECCV'22 | TEMOS: Generating diverse human motions from textual descriptions |
[pdf]
|[o-pytorch]
| - [TM2T] | ECCV'22 | TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts |
[pdf]
|[o-pytorch]
| - [T2T] | CVPR'22 | Generating Diverse and Natural 3D Human Motions from Text |
[pdf]
|[o-pytorch]
| - [MDM] | ICLR'23 | MDM: Human Motion Diffusion Model |
[pdf]
|[o-pytorch]
| - [MotionDiffuse] | Arxiv'22 (TPAMI'24) | MDM: Human Motion Diffusion Model |
[pdf]
|[o-pytorch]
| - [MLD] | CVPR'23 | Executing your Commands via Motion Diffusion in Latent Space |
[pdf]
|[o-pytorch]
| - [T2m-GPT] | CVPR'23 | T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations |
[pdf]
|[o-pytorch]
| - [Fg-T2M] | ICCV'23 | Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model |
[pdf]
| - | - [M2DM] | ICCV'23 | Priority-Centric Human Motion Generation in Discrete Latent Space |
[pdf]
| - | - [AttT2M] | ICCV'23 | AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism |
[pdf]
|[o-pytorch]
| - [MotionGPT] | NeurIPS'23 | MotionGPT: Human Motion as a Foreign Language |
[pdf]
|[o-pytorch]
| - [ReMoDiffuse †] | NeurIPS'23 | ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model |
[pdf]
|[o-pytorch]
| - [MMM] | CVPR'24 | MMM: Generative Masked Motion Model |
[pdf]
|[o-pytorch]
| - [MoMask] | CVPR'24 | MoMask: Generative Masked Modeling of 3D Human Motions |
[pdf]
|[o-pytorch]
| - [MotionLCM] | ECCV'24 | MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model |
[pdf]
|[o-pytorch]
| - [Motion Mamba] | ECCV'24 | Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM |
[pdf]
|[o-pytorch]
| - [BAMM] | ECCV'24 | BAMM: Bidirectional Autoregressive Motion Model |
[pdf]
| - |
- [GMD] | ICCV'23 | Guided motion diffusion for controllable human motion synthesis |
[pdf]
|[o-pytorch]
| - [PhysDiff] | ICCV'23 | PhysDiff: Physics-Guided Human Motion Diffusion Model |
[pdf]
| - | - [PriorMDM] | ICLR'24 | Human Motion Diffusion as a Generative Prior |
[pdf]
|[o-pytorch]
| - [OmniControl] | ICLR'24 | Omnicontrol: Control any joint at any time for human motion generation |
[pdf]
|[o-pytorch]
| - [MotionLCM] | ECCV'24 | MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model |
[pdf]
|[o-pytorch]
|
If you have any suggestions or find missing papers, please feel free to contact me.
This format of this awesome follows this project, thanks for such a pretty template!