Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 3.04 KB

README.md

File metadata and controls

32 lines (23 loc) · 3.04 KB

Diffusion Model

In this project, a diffusion model is applied to artificially generate new images from a given dataset. The diffusion process is inspired by [1] and [2] and the code is based on the implementation of this notebook.

The model consists of a simplified UNet architecture with several convolutional blocks and residual connections. Instead of a full diffusion process, each image is assigned a random time step value that is passed to the model together with the image. A Positional Encoding layer embeds the time step into a vector of sines and cosines (as in the Transformer architecture [3]). During training, a batch of images is transformed into its noisy version, where the degree of noise is estimated using the assigned diffusion time stamp. The model then learns to predict the added noise levels of the input batch. The loss function compares the predicted noise levels with the true noise levels, while the gradients are used to minimize this difference. This strategy is computationally more effective than passing each image into a full diffusion process involving all time steps. During inference however, the model is given a random noise Tensor which runs through the whole diffusion process reversely: Starting with the last time step, the noise Tensor is passed as an image to the model and the predicted noise level is then substracted from the image. This is done interatively until the first timestep is reached, which represents a generated, noiseless image. The amount of noise added/reduced to the image is predefined by the betas vector $\beta_t$ and determines a linearly increasing amount of noise per time step.

As an exemplary dataset, an image dataset for crater detection on Mars and Moon surface is used [4]. The training set consists of $98$ images, which are resized to a dimensionality of $64 \times 64$ and normalized to a value range of $[-1,1]$.

Overview_new

The above images are generated after $500$ epochs with a batch size of $16$. Rough structures can be recognized, but they still lack sharp contours. A higher model complexity or number of epochs may improve the current outcome.

References

[1] J. Ho et al. (2020), "Denoising Diffusion Probabilistic Models", 34 Conference on Neural Information Processing Systems (NeurIPS 2020), Available: https://arxiv.org/abs/2006.11239

[2] Dhariwal and Nichol (2021), "Diffusion Models Beat GANs on Image Synthesis", Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Available: https://arxiv.org/abs/2105.05233

[3] A. Vaswani et al. (2017), “Attention is all you need”, Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Available: https://arxiv.org/abs/1706.03762

[4] https://www.kaggle.com/datasets/lincolnzh/martianlunar-crater-detection-dataset