Skip to content

Latest commit

 

History

History
90 lines (74 loc) · 3.58 KB

README.md

File metadata and controls

90 lines (74 loc) · 3.58 KB

Toward High Quality Facial Representation Learning (ACM MM 2023)

Yue Wang, Jinlong Peng, Jiangning Zhang, Ran Yi, Liang Liu, Yabiao Wang, Chengjie Wang,

image

Abstract

Face analysis tasks have a wide range of applications, but the universal facial representation has only been explored in a few works. In this paper, we explore high-performance pre-training methods to boost the face analysis tasks such as face alignment and face parsing. We propose a self-supervised pre-training framework, called Mask Contrastive Face (MCF), with mask image modeling and a contrastive strategy specially adjusted for face domain tasks. To improve the facial representation quality, we use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling. To handle the face identity during the pre-training stage, we further use random masks to build contrastive learning pairs. We conduct the pre-training on the LAION-FACE-cropped dataset, a variants of LAION-FACE 20M, which contains more than 20 million face images from Internet websites. For efficiency pre-training, we explore our framework pre-training performance on a small part of LAION-FACE-cropped and verify the superiority with different pre-training settings. Our model pre-trained with the full pre-training dataset outperforms the state-of-the-art methods on multiple downstream tasks. Our model achieves 0.932 NME$_{diag}$ for AFLW-19 face alignment and 93.96 F1 score for LaPa face parsing.

Todo (Latest update: 2023/10/30)

  • **Release the testing code

Pre-trained Checkpoint

Model URL
ViT-B/16 16 epochs drive

Install package

pytorch==1.8.2
timm==0.3.2

Training

# folder dataset
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=8 main_pretrain.py \
    --batch_size 64 \
    --accum_iter 8 \
    --model mae_vit_base_patch16     \
    --input_size 224 \
    --mask_ratio 0.75     \
    --epochs 16     \
    --warmup_epochs 1     \
    --blr 1.5e-4 \
    --weight_decay 0.05 \
    --moco_m 0.99 \
    --cpl_ckpt  CKPT \
    --data_path  DATA_PATH \
    --log_dir  LOG_DIR \
    --output_dir OUTPUT_DIR
# lmdb dataset
OMP_NUM_THREADS=1 python3 -m torch.distributed.launch --nproc_per_node=8 main_pretrain.py \
    --batch_size 64 \
    --accum_iter 8 \
    --model mae_vit_base_patch16     \
    --input_size 224 \
    --mask_ratio 0.75     \
    --epochs 16     \
    --warmup_epochs 1     \
    --blr 1.5e-4 \
    --weight_decay 0.05 \
    --moco_m 0.99 \
    --use_lmdb_dataset \
    --lmdb_txt LMDB_DATASET \
    --cpl_ckpt  CKPT \
    --data_path  DATA_PATH \
    --log_dir  LOG_DIR \
    --output_dir OUTPUT_DIR

Citation

If you find this code helpful for your research, please cite:

@article{wang2023toward,
  title={Toward High Quality Facial Representation Learning},
  author={Wang, Yue and Peng, Jinlong and Zhang, Jiangning and Yi, Ran and Liu, Liang and Wang, Yabiao and Wang, Chengjie},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2023}
}