Skip to content

A flexible library for benchmarking LLMs on HF Inference Endpoints

Notifications You must be signed in to change notification settings

andrewrreed/auto-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

auto-bench

auto-bench is a flexible tool for benchmarking LLMs on Hugging Face Inference Endpoints. It provides an automated way to deploy models, run load tests, and analyze performance across different hardware configurations.

Features

  • Automated deployment of models to Hugging Face Inference Endpoints
  • Configurable load testing scenarios using K6
  • Support for various GPU instances
  • Detailed performance metrics collection and analysis
  • Easy-to-use Python API for creating and running benchmarks

Metrics

auto-bench relies on Grafana K6 to run load tests and collect metrics. The following metrics are collected:

  • Inter token latency: Time to generate a new output token for each user that is querying the system. It translates as the “speed” perceived by the end-user.
  • Time to First Token: Time the user has to wait before seeing the first token of its answer. Lower waiting time are essential for real-time interactions.
  • End to End latency: The overall time the system took to generate the full response to the user.
  • Throughput: The number of tokens per second the system can generate across all requests.
  • Successful requests: The number of requests the system was able to honor in the benchmark timeframe.
  • Error rate: The percentage of requests that ended up in error, as the system could not process them in time or failed to process them.

Setup

To get started with auto-bench, follow these steps:

  1. Clone the repository:
git clone https://github.com/andrewrreed/auto-bench.git
  1. Set up a virtual environment and activate it:
python -m venv .venv
source .venv/bin/activate
  1. Build the custom K6 binary with SSE support:
make build-k6
  1. Install the required Python packages:
poetry install

Getting Started

Check out the Getting Started Notebook to get familiar with basic usage.

Contact

For questions or suggestions, please open an issue on the GitHub repository.

About

A flexible library for benchmarking LLMs on HF Inference Endpoints

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published