A Collection of Papers and Codes for CVPR2024 AIGC
整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
Please feel free to star, fork or PR if helpful~
- Awesome-ECCV2024-AIGC
- Awesome-AIGC-Research-Groups
- Awesome-Low-Level-Vision-Research-Groups
- Awesome-CVPR2024-CVPR2021-CVPR2020-Low-Level-Vision
- Awesome-ECCV2020-Low-Level-Vision
CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024
CVPR接收论文列表:https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers
CVPR完整论文库:https://openaccess.thecvf.com/CVPR2024
开会时间:2024年6月17日-6月21日
论文接收公布时间:2024年2月27日
【Contents】
- 1.图像生成(Image Generation/Image Synthesis)
- 2.图像编辑(Image Editing)
- 3.视频生成(Video Generation/Image Synthesis)
- 4.视频编辑(Video Editing)
- 5.3D生成(3D Generation/3D Synthesis)
- 6.3D编辑(3D Editing)
- 7.多模态大语言模型(Multi-Modal Large Language Model)
- 8.其他多任务(Others)
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Wei_Adversarial_Score_Distillation_When_score_distillation_meets_GAN_CVPR_2024_paper.html
- Code: https://github.com/2y7c3/ASD
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
- Paper: https://arxiv.org/abs/2405.05252
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Liang_CapHuman_Capture_Your_Moments_in_Parallel_Universes_CVPR_2024_paper.html
- Code: https://github.com/VamosC/CapHuman
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
- Paper: https://arxiv.org/abs/2404.00521
- Code:
- Paper: https://arxiv.org/abs/2311.15773
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Tang_CoDi-2_In-Context_Interleaved_and_Interactive_Any-to-Any_Generation_CVPR_2024_paper.html
- Code: https://github.com/microsoft/i-Code/tree/main/CoDi-2
- Paper: https://arxiv.org/abs/2404.01143v1
- Code:
- Paper: https://arxiv.org/abs/2312.03045
- Code:
- Paper: https://arxiv.org/abs/2405.04356v1
- Code:
- Paper: https://arxiv.org/abs/2311.18257
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Wang_Domain_Gap_Embeddings_for_Generative_Dataset_Augmentation_CVPR_2024_paper.html
- Code: https://github.com/humansensinglab/DoGE
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
- Paper: https://arxiv.org/abs/2311.18822
- Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official
- Paper: https://arxiv.org/abs/2404.03913v1
- Code:
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
- Paper: https://arxiv.org/abs/2403.06775
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Si_FreeU_Free_Lunch_in_Diffusion_U-Net_CVPR_2024_paper.html
- Code: https://github.com/ChenyangSi/FreeU
- Paper: https://www.cs.jhu.edu/~alanlab/Pubs24/chen2024towards.pdf
- Code: https://github.com/MrGiovanni/DiffTumor
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Fu_Generate_Like_Experts_Multi-Stage_Font_Generation_by_Incorporating_Font_Transfer_CVPR_2024_paper.html
- Code: https://github.com/fubinfb/MSD-Font
- Paper: https://arxiv.org/abs/2311.10329
- Code: https://github.com/CodeGoat24/Face-diffuser?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2304.03411
- Code:
- Paper: https://arxiv.org/abs/2401.01952
- Code:
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
- Paper: https://arxiv.org/abs/2308.15692
- Code:
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
- Paper: https://arxiv.org/abs/2312.07330
- Code: https://github.com/cvlab-stonybrook/Large-Image-Diffusion
- Paper: https://arxiv.org/abs/2402.08654
- Code: https://github.com/ttchengab/continuous_3d_words_code/
- Paper: https://arxiv.org/abs/2311.15841
- Code:
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
- Paper: https://arxiv.org/abs/2308.10997
- Code:
- Paper: https://arxiv.org/abs/2403.04290
- Code:
- Paper: https://arxiv.org/abs/2404.02883
- Code:
- Paper: https://arxiv.org/abs/2405.12978
- Code:
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
- Paper: https://arxiv.org/abs/2404.15081
- Code:
- Paper: https://arxiv.org/abs/2406.01954
- Code:
- Paper: https://arxiv.org/abs/2401.09603
- Code: https://github.com/google-research/google-research/tree/master/cmmd
- Paper: https://arxiv.org/abs/2312.10240
- Code:
- Paper: https://arxiv.org/abs/2308.09972
- Code: https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2
- Paper: https://arxiv.org/abs/2402.17563
- Code:
- Paper: https://arxiv.org/abs/2403.18978
- Code:
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
- Paper: https://arxiv.org/abs/2403.05239
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Xia_Towards_More_Accurate_Diffusion_Model_Acceleration_with_A_Timestep_Tuner_CVPR_2024_paper.html
- Code: https://github.com/THU-LYJ-Lab/time-tuner
- Paper: https://arxiv.org/abs/2311.09257
- Code:
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Starodubcev_Your_Student_is_Better_Than_Expected_Adaptive_Teacher-Student_Collaboration_for_CVPR_2024_paper.html
- Code: https://github.com/yandex-research/adaptive-diffusion
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Cheng_3D-Aware_Face_Editing_via_Warping-Guided_Latent_Direction_Learning_CVPR_2024_paper.html
- Code: https://github.com/cyh-sj/FaceEdit3D
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Yin_Benchmarking_Segmentation_Models_with_Mask-Preserved_Attribute_Editing_CVPR_2024_paper.html
- Code: https://github.com/PRIS-CV/Pascal-EA
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
- Paper: https://arxiv.org/abs/2405.04377
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Sun_Content-Style_Decoupling_for_Unsupervised_Makeup_Transfer_without_Generating_Pseudo_Ground_CVPR_2024_paper.html
- Code: https://github.com/Snowfallingplum/CSD-MT
- Paper: https://arxiv.org/abs/2311.18608
- Code: https://github.com/HyelinNAM/ContrastiveDenoisingScore
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Sun_DiffAM_Diffusion-based_Adversarial_Makeup_Transfer_for_Facial_Privacy_Protection_CVPR_2024_paper.html
- Code: https://github.com/HansSunY/DiffAM
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Mou_DiffEditor_Boosting_Accuracy_and_Flexibility_on_Diffusion-based_Image_Editing_CVPR_2024_paper.html
- Code: https://github.com/MC-E/DragonDiffusion
Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing
- Paper: https://arxiv.org/abs/2312.10113
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Zhang_HIVE_Harnessing_Human_Feedback_for_Instructional_Visual_Editing_CVPR_2024_paper.html
- Code: https://github.com/salesforce/HIVE
- Paper: https://arxiv.org/abs/2403.09632
- Code: https://github.com/guoqincode/Focus-on-Your-Instruction
- Paper: hhttps://arxiv.org/abs/2312.04965
- Code: https://github.com/Twizwei/in-n-out
- Paper: hhttps://arxiv.org/abs/2312.04965
- Code: https://github.com/sled-group/InfEdit
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Brack_LEDITS_Limitless_Image_Editing_using_Text-to-Image_Models_CVPR_2024_paper.html
- Code: https://github.com/ml-research/ledits_pp
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
- Paper: https://arxiv.org/abs/2303.17546
- Code: https://github.com/YangChangHee/CVPR2024_Person-In-Place_RELEASE
- Paper: https://arxiv.org/abs/2405.19775
- Code:
- Paper: https://arxiv.org/abs/2403.00483
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Jiang_SCEdit_Efficient_and_Controllable_Image_Diffusion_Generation_via_Skip_Connection_CVPR_2024_paper.html
- Code: https://github.com/ali-vilab/SCEdit
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
- Paper: https://arxiv.org/abs/2402.18848
- Code:
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Bobkov_The_Devil_is_in_the_Details_StyleFeatureEditor_for_Detail-Rich_StyleGAN_CVPR_2024_paper.html
- Code: https://github.com/FusionBrainLab/StyleFeatureEditor
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Towards_Understanding_Cross_and_Self-Attention_in_Stable_Diffusion_for_Text-Guided_CVPR_2024_paper.html
- Code: https://github.com/alibaba/EasyNLP/tree/master/diffusion/FreePromptEditing
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Deng_Z_Zero-shot_Style_Transfer_via_Attention_Reweighting_CVPR_2024_paper.html
- Code: https://github.com/HolmesShuan/Zero-shot-Style-Transfer-via-Attention-Rearrangement
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
- Paper:
- Code:
- Paper: https://arxiv.org/abs/2403.01901
- Code:
- Paper: https://arxiv.org/abs/2404.00234
- Code: https://github.com/taegyeong-lee/Grid-Diffusion-Models-for-Text-to-Video-Generation
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Wu_LAMP_Learn_A_Motion_Pattern_for_Few-Shot_Video_Generation_CVPR_2024_paper.html
- Code: https://github.com/RQ-Wu/LAMP
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Zhang_CAMEL_CAusal_Motion_Enhancement_Tailored_for_Lifting_Text-driven_Video_Editing_CVPR_2024_paper.html
- Code: https://github.com/zhangguiwei610/CAMEL
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Liu_DynVideo-E_Harnessing_Dynamic_NeRF_for_Large-Scale_Motion-_and_View-Change_Human-Centric_CVPR_2024_paper.html
- Code: https://github.com/qiuyu96/CoDeF
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
- Paper: https://arxiv.org/abs/2312.00845
- Code: https://github.com/HyeonHo99/Video-Motion-Customization
- Paper: https://arxiv.org/abs/2405.09546
- Code: https://github.com/behavior-vision-suite/behavior-vision-suite.github.io
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
- Paper: https://arxiv.org/abs/2312.05208
- Code:
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
- Paper: https://paperswithcode.com/paper/diffusion-time-step-curriculum-for-one-image
- Code: https://github.com/yxymessi/DTC123
- Paper: https://arxiv.org/abs/2312.03050
- Code:
- Paper: https://arxiv.org/abs/2310.01406
- Code:
- Paper: https://arxiv.org/abs/2312.12274
- Code: https://github.com/Peter-Kocsis/IntrinsicImageDiffusion
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Tu_MotionEditor_Editing_Video_Motion_via_Content-Aware_Diffusion_CVPR_2024_paper.html
- Code: https://github.com/Francis-Rings/MotionEditor
- Paper: https://arxiv.org/abs/2402.05746
- Code: https://github.com/yifanlu0227/ChatSim?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2405.16925
- Code:
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
- Paper: https://arxiv.org/abs/2403.07773
- Code: https://github.com/zoomin-lee/SemCity?tab=readme-ov-file
- Paper: https://arxiv.org/abs/2312.09250
- Code: https://github.com/google-research/google-research/tree/master/mesh_diffusion
- Paper: https://cvlab.cse.msu.edu/pdfs/Ren_Kim_Liu_Liu_TIGER_supp.pdf
- Code: https://github.com/Zhiyuan-R/Tiger-Diffusion
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Song_Arbitrary_Motion_Style_Transfer_with_Multi-condition_Motion_Latent_Diffusion_Model_CVPR_2024_paper.html
- Code: https://github.com/XingliangJin/MCM-LDM
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/He_Customize_your_NeRF_Adaptive_Source_Driven_3D_Scene_Editing_via_CVPR_2024_paper.html
- Code: https://github.com/hrz2000/CustomNeRF
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Bao_GeneAvatar_Generic_Expression-Aware_Volumetric_Head_Avatar_Editing_from_a_Single_CVPR_2024_paper.html
- Code: https://github.com/zju3dv/GeneAvatar
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Mou_Instruct_4D-to-4D_Editing_4D_Scenes_as_Pseudo-3D_Scenes_Using_2D_CVPR_2024_paper.html
- Code: https://github.com/Friedrich-M/Instruct-4D-to-4D
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Radl_LAENeRF_Local_Appearance_Editing_for_Neural_Radiance_Fields_CVPR_2024_paper.html
- Code: https://github.com/r4dl/LAENeRF
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Chen_SHAP-EDITOR_Instruction-Guided_Latent_3D_Editing_in_Seconds_CVPR_2024_paper.html
- Code: https://github.com/silent-chen/Shap-Editor
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Wu_Text-Guided_3D_Face_Synthesis_-_From_Generation_to_Editing_CVPR_2024_paper.html
- Code: https://github.com/JiejiangWu/FaceG2E
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Stevens_BioCLIP_A_Vision_Foundation_Model_for_the_Tree_of_Life_CVPR_2024_paper.html
- Code: https://github.com/Imageomics/bioclip
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
- Paper: https://arxiv.org/abs/2405.20305
- Code:
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
- Paper: https://arxiv.org/abs/2312.02974
- Code: https://github.com/Understanding-Visual-Datasets/VisDiff
- Paper: https://arxiv.org/abs/2404.11207
- Code: https://github.com/zycheiheihei/transferable-visual-prompting
- Paper: https://arxiv.org/abs/2403.19949
- Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
- Paper: https://arxiv.org/abs/2404.16123
- Code:
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
- Paper: https://arxiv.org/abs/2404.16123
- Code:
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Improved_Baselines_with_Visual_Instruction_Tuning_CVPR_2024_paper.html
- Code: https://github.com/haotian-liu/LLaVA
- Paper: https://arxiv.org/abs/2404.00909
- Code:
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
- Paper: https://arxiv.org/abs/2403.07839
- Code:
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
- Paper: https://arxiv.org/abs/2404.09011
- Code:
- Paper: https://arxiv.org/abs/2404.01156
- Code:
- Paper: https://arxiv.org/abs/2403.12532
- Code:
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Li_Diff-BGM_A_Diffusion_Model_for_Video_Background_Music_Generation_CVPR_2024_paper.html
- Code: https://github.com/sizhelee/Diff-BGM
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Liang_InfLoRA_Interference-Free_Low-Rank_Adaptation_for_Continual_Learning_CVPR_2024_paper.html
- Code: https://github.com/liangyanshuo/InfLoRA
- Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Sarkar_Shadows_Dont_Lie_and_Lines_Cant_Bend_Generative_Models_dont_CVPR_2024_paper.html
- Code: https://github.com/hanlinm2/projective-geometry
持续更新~