CS229 final project

This repository contains a pytorch implementation for the Stanford CS229 course project Soundiffusion. We adopt two SOTA audio-to-image models.

Usage

First git clone the Sound2Scene repo, and download its pretrained audio encoder.

cd <reponame>
git clone https://github.com/postech-ami/Sound2Scene.git

To train you can choose the component you want to train, here we set unet, embedder

sh train.sh

To inference, you need to load the pretrained audio encoder, embedder and unet checkpoints

sh inf.sh