This is the official implementation of the paper "Uncovering the Text Embedding in Text-to-Image Diffusion Models".
Our code is based on stable-diffusion. This project requires one GPU with 48GB memory. Please first clone the repository and build the environment:
git clone https://github.com/wuqiuche/DiffusionDisentanglement
cd DiffusionDisentanglement
conda env create -f environment.yaml
conda activate ldm
You will also need to download the pretrained stable-diffusion model:
mkdir models/ldm/stable-diffusion-v1
wget -O models/ldm/stable-diffusion-v1/model.ckpt https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
/bin/bash edit_all.sh
/bin/bash edit_swap.sh
/bin/bash score.sh
Object Replace, Action Edit, Fader Control, Style Transfer, and Semantic Directions.
This code is adopted from https://github.com/CompVis/stable-diffusion and https://github.com/UCSB-NLP-Chang/DiffusionDisentanglement.