Skip to content

Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text

License

Notifications You must be signed in to change notification settings

ant-research/lumos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Visual Generative Priors without Text

Shuailei Ma*1, Kecheng Zheng*2, Ying Wei✉️1, Wei Wu2, Fan Lu2, Yifei Zhang3,Chen-Wei Xie4, Biao Gong2, Jiapeng Zhu5, Yujun Shen✉️2
1Northeastern University, China 2Ant Group 3SJTU 4Alibaba Group 5HKUST
*equal contribution ✉️corresponding author

📝 Content

📣 Update Log

  • [2024.11.21] 🎉 Here comes Lumos, we release the code and gradio demos of Lumos-I2I and Lumos-T2I.

🪄✨ Abstract

TL; DR: Lumos is a pure vision-based generative framework, which confirms the feasibility and the scalability of learning visual generative priors. It can be efficiently adapted to visual generative tasks such as text-to-image, image-to-3D, and image-to-video generation.

CLICK for the full abstract Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive. We argue that grasping the cross-modality alignment is not a necessity for a sound visual generative prior, whose focus should be on texture modeling. Such a philosophy inspires us to study image-to-image (I2I) generation, where models can learn from in-the-wild images in a self-supervised manner. We first develop a pure vision-based training framework, Lumos, and confirm the feasibility and the scalability of learning I2I models. We then find that, as an upstream task of T2I, our I2I model serves as a more foundational visual prior and achieves on-par or better performance than existing T2I models using only 1/10 text-image pairs for fine-tuning. We further demonstrate the superiority of I2I priors over T2I priors on some text-irrelevant visual generative tasks, like image-to-3D and image-to-video.

Visualization various downstream tasks  of Lumos

⚙️ Setup

Follow the following guide to set up the environment.

Install the required dependencies by following the command.

  1. git clone repo.

    git clone https://github.com/xiaomabufei/lumos.git
    cd lumos
    
  2. download model checkpoints

    mkdir ./checkpoints && cd ./checkpoints
    git lfs install
    git clone https://huggingface.co/Xiaomabufei/lumos
    
  3. create environment

    conda create -n lumos python=3.9 -y
    conda activate lumos
    
  4. install torch with GPU support

    pip install torch==2.2.1+cu118 torchvision==0.17.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html
    
  5. install xformers corresponding to torch and cuda

    pip install -U xformers==0.0.25
    
  6. install the remaining environment

    pip install -r requirements.txt
    
  7. run lumos Image Interpolation

    python gradio_demos/lumos_I2I.py
    
  8. run lumos Text-to-Image Generation

    python gradio_demos/lumos_T2I.py
    

    If you are mainland user, you may try export HF_ENDPOINT=https://hf-mirror.com to use huggingface mirror to facilitate the download of some necessary checkpoints to run our system.

📖 Citation

Don't forget to cite this source if it proves useful in your research!

@article{Lumos2024, 
	title={Learning Visual Generative Priors without Text}, 
	author={Ma, Shuailei and Zheng, Kecheng and Wei, Ying and Wu, Wei and Lu, Fan and Zhang, Yifei and Xie, Chen-Wei and Gong, Biao and Zhu, Jiapeng and Shen, Yujun}, 
	year={2024}, 
	eprint={arxiv}, 
	archivePrefix={arXiv}, 
	primaryClass={cs.CV}}

License

This repository is released under the MiT license as found in the LICENSE file.

Acknowledgement

Our implementation is based on DiT, Pixart-α and Dino. Thanks for their remarkable contribution and released code!

About

Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages