Skip to content

RoyalCities/RC-stable-audio-tools

 
 

Repository files navigation

🎵 RC Stable Audio Tools

Stable Audio Tools provides training and inference tools for generative audio models from Stability AI. This repository is a fork with additional modifications to enhance functionality such as:

  • Dynamic Model Loading: Enables dynamic model swaps of the base model and any future community finetune releases.

Model Loader Gif

  • Random Prompt Button: A one-click Random Prompt button tied directly onto the loaded models metadata.

Random Prompt Button Gif

  • BPM & Bar Selector: BPM & Bar settings tied to the model's timing conditioning, which will auto-fill any prompt with the needed BPM/Bar info. You can also lock or unlock the BPM if you wish to randomize this as well with the Random Prompt button.

BPM and Bar Example Gif

  • Key Signature Locking: Key signature is now tied to UI and can be locked or unlocked with the random prompt button.

Key Signature Image

  • Automatic Sample to MIDI Converter: The fork will automatically convert all generated samples to .MID format, enabling users to have an infinite source of MIDI.

Midi Converter Example Gif

  • Automatic Sample Trimming: The fork will automatically trim all generated samples to the exact length desired for easier importing into DAWs.

Midi Converter Example Gif

🚀 Installation

📥 Clone the Repository

First, clone the repository to your local machine:

git clone https://github.com/RoyalCities/RC-stable-audio-tools.git
cd RC-stable-audio-tools

🔧 Setup the Environment

🌐 Create a Virtual Environment

It's recommended to use a virtual environment to manage dependencies:

  • Windows:

    python -m venv venv
    venv\Scripts\activate
  • macOS and Linux:

    python3 -m venv venv
    source venv/bin/activate

📦 Install the Required Packages

Install Stable Audio Tools and the necessary packages from setup.py:

pip install stable-audio-tools
pip install .

🪟 Additional Step for Windows Users

To ensure Gradio uses GPU/CUDA and not default to CPU, uninstall and reinstall torch, torchvision, and torchaudio with the correct CUDA version:

pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

⚙️ Configuration

A sample config.json is included in the root directory. Customize it to specify directories for custom models and outputs (.wav and .mid files will be stored here):

{
    "model_directory": "models",
    "output_directory": "generations"
}

🖥️ Usage

🎚️ Running the Gradio Interface

Start the Gradio interface using a batch file or directly from the command line:

Batch file example

@echo off
cd /d path-to-your-venv/Scripts
call activate
cd /d path-to-your-stable-audio-tools
python run_gradio.py --model-config models/path-to-config/example_config.json --ckpt-path models/path-to-config/example.ckpt
pause

Basic command line example

You can launch the web UI by simply calling:

python run_gradio.py

This will start the gradio UI. If you're running for the first time, it will launch a model downloader interface, where you can initialize the app by downloading your first model. After downloading, you will need to restart the app to get the full UI.

When you run the app AFTER downloading a model, the full UI will launch.

Custom command line example

You can also launch the app with custom flags:

python run_gradio.py --model-config models/path-to-config/example_config.json --ckpt-path models/path-to-config/example.ckpt

🎶 Generating Audio and MIDI

Input prompts in the Gradio interface to generate audio and MIDI files, which will be saved as specified in config.json.

The interface has been expanded with Bar/BPM settings (which modifies both the user prompt + sample length conditioning), MIDI display + conversion and also features Dynamic Model Loading.

Models must be stored inside their own sub folder along with their accompanying config files. i.e. A single finetune could have multiple checkpoints. All related checkpoints could go inside of the same "model1" subfolder but its important their associated config file is included within the same folder as the checkpoint itself.

To switch models simply pick the model you want to load using the drop down and pick "Load Model".

🤗 Downloading models from HuggingFace

hffs.gif

When you launch with python run_gradio.py, it will:

  1. First check if the models folder has any model downloaded.
  2. If there is a model, it will launch the full UI with that model loaded.
  3. If the models folder is empty, it will launch a HFFS (HuggingFace downloader) UI, where you can either select from the preset models, or enter any HuggingFace repo id to download. (After downloading a model, you will need to restart the app to launch the full UI).
  4. To customize the preset models that appear in the downloader dropdown, edit the config.json file to add more entries to the hffs[0].options array.

🛠️ Advanced Usage

For detailed instructions on training and inference commands, flags, and additional options, refer to the main GitHub documentation: Stable Audio Tools Detailed Usage


I did my best to make sure the code is OS agnostic but I've only been able to test this with Windows / NVIDIA. Hopefully it works for other operating systems. The project now fully supports macOS and Apple Silicon (M1 and above). Special thanks to @cocktailpeanut for their help!

If theres any other features or tooling that you may want let me know on here or by contacting me on Twitter. I'm just a hobbyist but if it can be done I'll see what I can do.

Have fun!

About

Generative models for conditional audio generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%