Stable Audio Tools provides training and inference tools for generative audio models from Stability AI. This repository is a fork with additional modifications to enhance functionality such as:
- Dynamic Model Loading: Enables dynamic model swaps of the base model and any future community finetune releases.
- Random Prompt Button: A one-click Random Prompt button tied directly onto the loaded models metadata.
- BPM & Bar Selector: BPM & Bar settings tied to the model's timing conditioning, which will auto-fill any prompt with the needed BPM/Bar info. You can also lock or unlock the BPM if you wish to randomize this as well with the Random Prompt button.
- Key Signature Locking: Key signature is now tied to UI and can be locked or unlocked with the random prompt button.
- Automatic Sample to MIDI Converter: The fork will automatically convert all generated samples to .MID format, enabling users to have an infinite source of MIDI.
- Automatic Sample Trimming: The fork will automatically trim all generated samples to the exact length desired for easier importing into DAWs.
First, clone the repository to your local machine:
git clone https://github.com/RoyalCities/RC-stable-audio-tools.git
cd RC-stable-audio-tools
It's recommended to use a virtual environment to manage dependencies:
-
Windows:
python -m venv venv venv\Scripts\activate
-
macOS and Linux:
python3 -m venv venv source venv/bin/activate
Install Stable Audio Tools and the necessary packages from setup.py
:
pip install stable-audio-tools
pip install .
To ensure Gradio uses GPU/CUDA and not default to CPU, uninstall and reinstall torch
, torchvision
, and torchaudio
with the correct CUDA version:
pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
A sample config.json
is included in the root directory. Customize it to specify directories for custom models and outputs (.wav and .mid files will be stored here):
{
"model_directory": "models",
"output_directory": "generations"
}
Start the Gradio interface using a batch file or directly from the command line:
@echo off
cd /d path-to-your-venv/Scripts
call activate
cd /d path-to-your-stable-audio-tools
python run_gradio.py --model-config models/path-to-config/example_config.json --ckpt-path models/path-to-config/example.ckpt
pause
You can launch the web UI by simply calling:
python run_gradio.py
This will start the gradio UI. If you're running for the first time, it will launch a model downloader interface, where you can initialize the app by downloading your first model. After downloading, you will need to restart the app to get the full UI.
When you run the app AFTER downloading a model, the full UI will launch.
You can also launch the app with custom flags:
python run_gradio.py --model-config models/path-to-config/example_config.json --ckpt-path models/path-to-config/example.ckpt
Input prompts in the Gradio interface to generate audio and MIDI files, which will be saved as specified in config.json
.
The interface has been expanded with Bar/BPM settings (which modifies both the user prompt + sample length conditioning), MIDI display + conversion and also features Dynamic Model Loading.
Models must be stored inside their own sub folder along with their accompanying config files. i.e. A single finetune could have multiple checkpoints. All related checkpoints could go inside of the same "model1" subfolder but its important their associated config file is included within the same folder as the checkpoint itself.
To switch models simply pick the model you want to load using the drop down and pick "Load Model".
When you launch with python run_gradio.py
, it will:
- First check if the
models
folder has any model downloaded. - If there is a model, it will launch the full UI with that model loaded.
- If the models folder is empty, it will launch a HFFS (HuggingFace downloader) UI, where you can either select from the preset models, or enter any HuggingFace repo id to download. (After downloading a model, you will need to restart the app to launch the full UI).
- To customize the preset models that appear in the downloader dropdown, edit the
config.json
file to add more entries to thehffs[0].options
array.
For detailed instructions on training and inference commands, flags, and additional options, refer to the main GitHub documentation: Stable Audio Tools Detailed Usage
I did my best to make sure the code is OS agnostic but I've only been able to test this with Windows / NVIDIA. Hopefully it works for other operating systems. The project now fully supports macOS and Apple Silicon (M1 and above). Special thanks to @cocktailpeanut for their help!
If theres any other features or tooling that you may want let me know on here or by contacting me on Twitter. I'm just a hobbyist but if it can be done I'll see what I can do.
Have fun!