Skip to content

Latest commit

 

History

History
19 lines (18 loc) · 641 Bytes

README.md

File metadata and controls

19 lines (18 loc) · 641 Bytes

CS229 final project

This repository contains a pytorch implementation for the Stanford CS229 course project Soundiffusion. We adopt two SOTA audio-to-image models.

Usage

First git clone the Sound2Scene repo, and download its pretrained audio encoder.

cd <reponame>
git clone https://github.com/postech-ami/Sound2Scene.git

Train

To train you can choose the component you want to train, here we set unet, embedder

sh train.sh

Inference

To inference, you need to load the pretrained audio encoder, embedder and unet checkpoints

sh inf.sh