Skip to content

Run a free Whisper API for Speech-to-Text with GPU backend

Notifications You must be signed in to change notification settings

pgpt-ai/free-whisper-api-gpu

 
 

Repository files navigation

Free Whisper API with GPU backend

This repository demonstrates how to create a free whisper API with a GPU backend, so you can get transcripts more quickly. Here's a comparison of inference times on the different hardware options in Colab (source):

While not a robust/permanent solution, it can save you money on small projects. Here's the cost to transcribe 1000 hours of audio with Whisper on an A100 in GCP for different model and batch sizes (source):

You can try also transcribe files in just a few minutes with Python if you're looking for a no-setup solution using AssemblyAI. Grab a free API key to start transcribing, understanding, and prompting your audio files.

Get a key

To understand how the API works, check out our companion article.

Initial setup

  1. Create an Ngrok account if you do not already have one and verify your email
  2. Make sure Python 3.9 or 3.10 is installed on your system

Start the API

To run with GPU

Go to the companion Colab:

Open In Colab

Follow the instruction to start the API

To run locally

You can also run the API locally. While this method doesn't offload the inference to Colab, you may want to do this while using the tiny model for testing, or with the large model if you have a GPU.

Local setup

  1. Open a terminal and set your ngrok authtoken with ngrok authtoken YOUR-AUTHTOKEN-HERE. You can find your authtoken on your dashboard
  2. Install ffmpeg if it is not already installed on your system
  3. (Optional) Create a virtual environment for your project with python3 -m venv venv and then activate it with . venv/bin/activate on Linux/MacOS or .\venv\Scripts\activate.bat on Windows. You may have to use python instead of python3
  4. Install the required packages with pip install -r requirements.txt

Start the API

  1. Run python3 api.py in order to start the Flask API at http://127.0.0.1:8008

Consume the API

Once your API is up and running, you can hit it using any tool that can make POST requests. For example, you can use cURL to make requests in the terminal. Be sure to replace the URl with your Ngrok URL (or localhost if running locally) with the /transcribe endpoint

curl -X POST "https://YOUR-URL.ngrok-free.app/transcribe" \
-H "Content-Type: application/json" \
-d '{"file": "https://storage.googleapis.com/aai-web-samples/Custom-Home-Builder.mp3", "model": "tiny"}'

You can also consume your API in Python with the requests library (you'll need to pip install requests if you haven't done so already):

import requests
import os

NGROK_URL = "https://YOUR-URL.ngrok-free.app/"
TRANSCRIBE_ENDPOINT = os.path.join(NGROK_URL, "transcribe")

json_data = {'file': "https://storage.googleapis.com/aai-web-samples/Custom-Home-Builder.mp3",
            'model': 'tiny'}
response = requests.post(TRANSCRIBE_ENDPOINT, json=json_data)

print(response.json()['transcript'])

You can check out/execute basic-request.py to see how to make requests to the API for both remote and local files.

You can take a look at transcribe.py to see a more robust way of calling the API that abstracts away the calling details.

About

Run a free Whisper API for Speech-to-Text with GPU backend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%