GitHub - ant-research/lumos: Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text

Learning Visual Generative Priors without Text

Shuailei Ma*¹, Kecheng Zheng*², Ying Wei✉️¹, Wei Wu², Fan Lu², Yifei Zhang³,Chen-Wei Xie⁴, Biao Gong², Jiapeng Zhu⁵, Yujun Shen✉️²
¹Northeastern University, China ²Ant Group ³SJTU ⁴Alibaba Group ⁵HKUST
^*equal contribution ^✉️corresponding author

📣 Update Log

[2024.11.21] 🎉 Here comes Lumos, we release the code and gradio demos of Lumos-I2I and Lumos-T2I.

🪄✨ Abstract

TL; DR: Lumos is a pure vision-based generative framework, which confirms the feasibility and the scalability of learning visual generative priors. It can be efficiently adapted to visual generative tasks such as text-to-image, image-to-3D, and image-to-video generation.

CLICK for the full abstract

Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive. We argue that grasping the cross-modality alignment is not a necessity for a sound visual generative prior, whose focus should be on texture modeling. Such a philosophy inspires us to study image-to-image (I2I) generation, where models can learn from in-the-wild images in a self-supervised manner. We first develop a pure vision-based training framework, Lumos, and confirm the feasibility and the scalability of learning I2I models. We then find that, as an upstream task of T2I, our I2I model serves as a more foundational visual prior and achieves on-par or better performance than existing T2I models using only 1/10 text-image pairs for fine-tuning. We further demonstrate the superiority of I2I priors over T2I priors on some text-irrelevant visual generative tasks, like image-to-3D and image-to-video.

⚙️ Setup

Follow the following guide to set up the environment.

Python >= 3.9 (Recommend to use Anaconda or Miniconda)
PyTorch >= 2.2.1+cu11.8
Better create a virtual environment

Install the required dependencies by following the command.

git clone repo.

git clone https://github.com/xiaomabufei/lumos.git
cd lumos

download model checkpoints

mkdir ./checkpoints && cd ./checkpoints
git lfs install
git clone https://huggingface.co/Xiaomabufei/lumos

create environment

conda create -n lumos python=3.9 -y
conda activate lumos

install torch with GPU support

pip install torch==2.2.1+cu118 torchvision==0.17.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html

install xformers corresponding to torch and cuda
```
pip install -U xformers==0.0.25
```
install the remaining environment
```
pip install -r requirements.txt
```
run lumos Image Interpolation
```
python gradio_demos/lumos_I2I.py
```
run lumos Text-to-Image Generation
```
python gradio_demos/lumos_T2I.py
```
If you are mainland user, you may try export HF_ENDPOINT=https://hf-mirror.com to use huggingface mirror to facilitate the download of some necessary checkpoints to run our system.

📖 Citation

Don't forget to cite this source if it proves useful in your research!

@article{Lumos2024, 
	title={Learning Visual Generative Priors without Text}, 
	author={Ma, Shuailei and Zheng, Kecheng and Wei, Ying and Wu, Wei and Lu, Fan and Zhang, Yifei and Xie, Chen-Wei and Gong, Biao and Zhu, Jiapeng and Shen, Yujun}, 
	year={2024}, 
	eprint={arxiv}, 
	archivePrefix={arXiv}, 
	primaryClass={cs.CV}}

License

This repository is released under the MiT license as found in the LICENSE file.

Acknowledgement

Our implementation is based on DiT, Pixart-α and Dino. Thanks for their remarkable contribution and released code!

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
asset		asset
docs		docs
gradio_demos		gradio_demos
lumos_diffusion		lumos_diffusion
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Visual Generative Priors without Text

📝 Content

📣 Update Log

🪄✨ Abstract

⚙️ Setup

📖 Citation

License

Acknowledgement

About

Releases

Packages

Languages

License

ant-research/lumos

Folders and files

Latest commit

History

Repository files navigation

Learning Visual Generative Priors without Text

📝 Content

📣 Update Log

🪄✨ Abstract

⚙️ Setup

📖 Citation

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages