1Northeastern University, China 2Ant Group 3SJTU 4Alibaba Group 5HKUST
*equal contribution ✉️corresponding author
- [2024.11.21] 🎉 Here comes Lumos, we release the code and gradio demos of Lumos-I2I and Lumos-T2I.
TL; DR: Lumos is a pure vision-based generative framework, which confirms the feasibility and the scalability of learning visual generative priors. It can be efficiently adapted to visual generative tasks such as text-to-image, image-to-3D, and image-to-video generation.
CLICK for the full abstract
Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive. We argue that grasping the cross-modality alignment is not a necessity for a sound visual generative prior, whose focus should be on texture modeling. Such a philosophy inspires us to study image-to-image (I2I) generation, where models can learn from in-the-wild images in a self-supervised manner. We first develop a pure vision-based training framework, Lumos, and confirm the feasibility and the scalability of learning I2I models. We then find that, as an upstream task of T2I, our I2I model serves as a more foundational visual prior and achieves on-par or better performance than existing T2I models using only 1/10 text-image pairs for fine-tuning. We further demonstrate the superiority of I2I priors over T2I priors on some text-irrelevant visual generative tasks, like image-to-3D and image-to-video.Follow the following guide to set up the environment.
- Python >= 3.9 (Recommend to use Anaconda or Miniconda)
- PyTorch >= 2.2.1+cu11.8
- Better create a virtual environment
Install the required dependencies by following the command.
-
git clone repo.
git clone https://github.com/xiaomabufei/lumos.git cd lumos
-
download model checkpoints
mkdir ./checkpoints && cd ./checkpoints git lfs install git clone https://huggingface.co/Xiaomabufei/lumos
-
create environment
conda create -n lumos python=3.9 -y conda activate lumos
-
install torch with GPU support
pip install torch==2.2.1+cu118 torchvision==0.17.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html
-
install xformers corresponding to torch and cuda
pip install -U xformers==0.0.25
-
install the remaining environment
pip install -r requirements.txt
-
run lumos Image Interpolation
python gradio_demos/lumos_I2I.py
-
run lumos Text-to-Image Generation
python gradio_demos/lumos_T2I.py
If you are mainland user, you may try
export HF_ENDPOINT=https://hf-mirror.com
to use huggingface mirror to facilitate the download of some necessary checkpoints to run our system.
Don't forget to cite this source if it proves useful in your research!
@article{Lumos2024,
title={Learning Visual Generative Priors without Text},
author={Ma, Shuailei and Zheng, Kecheng and Wei, Ying and Wu, Wei and Lu, Fan and Zhang, Yifei and Xie, Chen-Wei and Gong, Biao and Zhu, Jiapeng and Shen, Yujun},
year={2024},
eprint={arxiv},
archivePrefix={arXiv},
primaryClass={cs.CV}}
This repository is released under the MiT license as found in the LICENSE file.
Our implementation is based on DiT, Pixart-α and Dino. Thanks for their remarkable contribution and released code!