This repository contains the official implementation for our paper: StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset.
Main framework for our method. (a) We use human-object offset to encode the spatial relation between the human and the object. For a human-object pair, offsets are calculated and flattened into an offset vector x. Based on all offset vectors calculated from the training set, the latent spatial relation space is constructed using principal component analysis. To get a vectorized representation for human-object spatial relation, the offset vector is projected into this latent spatial relation space by linear projection. Inversely, given a sample γ from this latent spatial relation space, we can reproject it to recover offset vector xˆ. The human-object instance can be reconstructed from xˆ by iterative optimization. (b) With pre-constructed latent spatial relation space, we use stacked normalizing flow to infer the posterior distribution of human-object spatial relation for an input image. (c) In the post-optimization stage, we further finetune the reconstruction results using 2D-3D reprojection loss and offset loss.
Model | Visible Ratio | Post-Optimization | HOI aligned | w/o HOI aligned | ||
---|---|---|---|---|---|---|
SMPL | Object | SMPL | Object | |||
PHOSA | >0.3 | √ | 12.17±11.13 | 26.62±21.87 | - | - |
CHORE | >0.3 | √ | 5.58±2.11 | 10.66±7.71 | - | - |
StackFLOW | all | × | 4.72±1.99 | 11.85±11.02 | 7.63±5.88 | 15.71±14.35 |
√ | 4.50±1.91 | 9.12±8.82 | 9.41±14.88 | 11.15±18.12 | ||
>0.3 | × | 4.71±1.99 | 11.45±10.43 | 7.66±5.98 | 15.19±13.57 | |
√ | 4.51±1.92 | 8.77±8.33 | 9.26±15.03 | 10.51±17.76 | ||
StackFLOW-AUG | all | × | 4.62±1.93 | 12.16±11.73 | 7.37±3.95 | 16.15±14.85 |
√ | 4.43±1.85 | 8.71±8.58 | 8.79±10.74 | 10.90±15.17 | ||
>0.3 | × | 4.60±1.93 | 11.63±10.90 | 7.40±4.00 | 15.49±13.83 | |
√ | 4.43±1.86 | 8.30±7.91 | 8.62±10.78 | 10.19±14.44 |
Model | Post-Optimization | Align w=1 | Align w=10 | ||
---|---|---|---|---|---|
SMPL | Object | SMPL | Object | ||
StackFLOW | × | 4.89±2.27 | 11.38±9.40 | 5.37±2.53 | 12.11±10.33 |
√ | 4.71±2.09 | 9.44±8.75 | - | - | |
√ ( + sequence smooth) | 5.46±4.16 | 11.58±15.35 | 6.01±4.25 | 12.21±15.80 |
Full-Sequence Evaluation
Model | Post-Optimization | Align w=1 | Align w=10 | ||
---|---|---|---|---|---|
SMPL | Object | SMPL | Object | ||
CHORE | √ | 5.55 | 10.02 | 18.33 | 20.32 |
VisTracker | √ | 5.25 | 8.04 | 7.81 | 8.49 |
StackFLOW | × | 4.42±1.96 | 10.87±10.43 | 5.23±2.42 | 11.64±10.98 |
√ (w/o offset loss) | 4.49±2.01 | 9.14±8.53 | 5.01±2.66 | 9.36±8.67 | |
√ ( + sequence smooth) | 4.39±2.46 | 8.57±8.96 | 4.98±3.07 | 8.94±9.29 |
Follow these instructions to set up the environments for this project. Make sure you have downloaded the checkpoint and put it in the directory PROJECT_ROOT/outputs/stackflow/behave_aug/
. Then run:
Demo Occlusion:
python ./demo_occlusion.py --cfg_file ./stackflow/configs/behave_aug.yaml --img_path ./data/demo/occlusion/3_3_basketball_none_026_0.color.jpg
Results will be written to the directory PROJECT_ROOT/outputs/demo/occlusion/
.
Demo Optimization with Multi-Object:
python ./demo_multi_object.py --cfg_file ./stackflow/configs/behave_aug.yaml --post_optimization
Results will be written to the directory PROJECT_ROOT/outputs/demo/multi_objects/
.
Demo Optimization Full Sequence:
Make sure you have downloaded the checkpoint and have prepared the BEHAVE-Extended dataset following this instruction.
python ./demo_sequence.py --cfg_file ./stackflow/configs/behave_extend.yaml --dataset_root_dir $BEHAVE_ROOT_DIR
We run this script on a single A100 GPU with 80GB memory. Results will be written to the directory PROJECT_ROOT/outputs/demo/multi_objects/sequences
.
Before training, make sure you have prepared BEHAVE or InterCap dataset following these instructions.
- For the BEHAVE dataset, you should go through steps #1, #3, #4, and #6. If you want to train with augmented data, you should also go through steps #2 and #5.
- For the InterCap dataset and BEHAVE-Extended dataset, you should go through all steps.
Don't forget to redirect the path _C.dataset.bg_dir
(in file PROJECT_DIR/stackflow/configs/__init__.py
(line 25)) to VOC_DIR
in your custom setting. Then run:
python ./stackflow/train.py --cfg_file ./stackflow/configs/behave.yaml --dataset_root_dir $BEHAVE_ROOT_DIR
Our model can be trained within 2 days on a single GPU. The logs and checkpoints will be saved to the path PROJECT_DIR/outputs/stackflow
.
Evaluate on BEHAVE dataset:
Download the full mask from here for BEHAVE datasets and zip it to the path BEHAVE_ROOT_DIR
. Make sure the path for the checkpoint in PROJECT_DIR/stackflow/configs/behave.yaml
exists. If you want to evaluate models with pose-optimization, you need to follow this instruction to prepare the keypoints of the person and the 2D-3D corresponding maps of the object.
python ./stackflow/evaluate_frames.py --cfg_file ./stackflow/configs/behave.yaml --dataset_root_dir $BEHAVE_ROOT_DIR
Evaluate on BEHAVE-extended dataset:
python ./stackflow/evaluate_sequences.py --cfg_file ./stackflow/configs/behave_extend.yaml --dataset_root_dir $BEHAVE_ROOT_DIR
The reconstruction results and evaluation metrics will be saved to the directory PROJECT_DIR/outputs/stackflow
.
This work borrows some codes from ProHMR and CDPN. Thanks for these fantastic works.
If you have any questions, please feel free to put forward your issues and contact me.
@inproceedings{ijcai2023p100,
title = {StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset},
author = {Huo, Chaofan and Shi, Ye and Ma, Yuexin and Xu, Lan and Yu, Jingyi and Wang, Jingya},
booktitle = {Proceedings of the Thirty-Second International Joint Conference on
Artificial Intelligence, {IJCAI-23}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Edith Elkind},
pages = {902--910},
year = {2023},
month = {8},
note = {Main Track},
doi = {10.24963/ijcai.2023/100},
url = {https://doi.org/10.24963/ijcai.2023/100},
}