Stable Diffusion nano is a simplified implementation of latent diffusion models inspired by the short course How Diffusion Models Work by deeplearning.ai. The repository aims to make the concepts and structure of diffusion models, particularly Figure 3 from the paper High-Resolution Image Synthesis with Latent Diffusion Model, accessible to beginners.
The repository implements models proposed in two key papers:
- Denoising Diffusion Probabilistic Models
- High-Resolution Image Synthesis with Latent Diffusion Model
The official implementations of the above papers are often too complex for beginners. Stable Diffusion nano simplifies these concepts and presents them in an intuitive, beginner-friendly manner using Jupyter notebooks. Our goal is to provide a hands-on learning experience by focusing on essential components while avoiding unnecessary complexity.
For those interested in deeper theoretical insights, refer to A Gentle Introduction to Diffusion Model: Part 1 - DDPM.
This repository specifically details the Multi-head Attention (MHA) mechanism between latent feature maps and condition embeddings with diagrams. We believe this will be particularly helpful for those who may find MHA difficult to understand.
A simplified U-Net structure was used, emphasizing the integration of multi-head attention across feature maps. The goal is to demonstrate how multi-head attention can be effectively applied to each feature map within the network.
This repository includes the following notebooks:
- Description: Implements the basic Denoising Diffusion Probabilistic Model (DDPM).
- Goal: Understand the fundamental process of adding and removing noise to generate images from random noise.
- Description: Implements an image encoder-decoder (VAE) for converting pixel-space images into latent-space representations, a crucial step for Latent Diffusion Models.
- Goal: Learn how to compress and reconstruct images using a Variational Autoencoder (VAE).
- Description: Simplifies the structure in Figure 3 of the LDM paper to create a basic latent diffusion model.
- Goal: Implement a complete but simplified Latent Diffusion Model while maintaining the essential architecture and principles.
Animation |
---|
We use a custom dataset of 16x16 image sprites prepared from:
This dataset was utilized in the course How Diffusion Models Work. The small resolution ensures faster training and inference, making it suitable for educational purposes.
- Python 3.8 or higher
- Jupyter Notebook
- PyTorch
- torchvision
- numpy
- matplotlib
- plotly
- Clone the repository:
git clone https://github.com/yourusername/stable-diffusion-nano.git cd stable-diffusion-nano
- Install dependencies:
pip install -r requirements.txt
Open any notebook in Jupyter and run the cells sequentially. Start with 01.ddpm.ipynb
for the basics and progress to 03.ldm_nano.ipynb
for the complete model. You can also run the notebooks on Google Colab for free. Simply upload the desired notebook to Colab and ensure the necessary dependencies are installed.
Chapter | Colab |
---|---|
DDPM Notebook | |
VAE Latent 2D Notebook | |
LDM Nano Notebook |
Contributions are welcome! If you find any issues or want to add new features, feel free to open an issue or submit a pull request.
This project is licensed under the Non-Commercial Use Only License.
- Non-Commercial Use Only: This software is provided for personal, educational, and non-commercial purposes only.
- Commercial Use Prohibited: Commercial use of this software is strictly prohibited without prior written consent from the copyright holder.
- For inquiries about commercial licensing, please contact [email protected].
- How Diffusion Models Work by deeplearning.ai for the inspiration.
- Authors of Denoising Diffusion Probabilistic Models and High-Resolution Image Synthesis with Latent Diffusion Model for their groundbreaking work.
- FrootsnVeggies and kyrise for the dataset.