auto-bench

auto-bench is a flexible tool for benchmarking LLMs on Hugging Face Inference Endpoints. It provides an automated way to deploy models, run load tests, and analyze performance across different hardware configurations.

Features

Automated deployment of models to Hugging Face Inference Endpoints
Configurable load testing scenarios using K6
Support for various GPU instances
Detailed performance metrics collection and analysis
Easy-to-use Python API for creating and running benchmarks

Metrics

auto-bench relies on Grafana K6 to run load tests and collect metrics. The following metrics are collected:

Inter token latency: Time to generate a new output token for each user that is querying the system. It translates as the “speed” perceived by the end-user.
Time to First Token: Time the user has to wait before seeing the first token of its answer. Lower waiting time are essential for real-time interactions.
End to End latency: The overall time the system took to generate the full response to the user.
Throughput: The number of tokens per second the system can generate across all requests.
Successful requests: The number of requests the system was able to honor in the benchmark timeframe.
Error rate: The percentage of requests that ended up in error, as the system could not process them in time or failed to process them.

Setup

To get started with auto-bench, follow these steps:

Clone the repository:

git clone https://github.com/andrewrreed/auto-bench.git

Set up a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate

Build the custom K6 binary with SSE support:

make build-k6

Install the required Python packages:

poetry install

Getting Started

Check out the Getting Started Notebook to get familiar with basic usage.

Contact

For questions or suggestions, please open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
autobench		autobench
notebooks		notebooks
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

auto-bench

Features

Metrics

Setup

Getting Started

Contact

About

Releases

Packages

Languages

andrewrreed/auto-bench

Folders and files

Latest commit

History

Repository files navigation

auto-bench

Features

Metrics

Setup

Getting Started

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages