Skip to content

Commit

Permalink
Nathan refacto cli (#407)
Browse files Browse the repository at this point in the history
Complete revamp of the cli. See documentation for more details. `lighteval --help`
  • Loading branch information
NathanHB authored Dec 4, 2024
1 parent 8e977cb commit 6e2754e
Show file tree
Hide file tree
Showing 28 changed files with 981 additions and 587 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ jobs:
- name: Test
env:
HF_TEST_TOKEN: ${{ secrets.HF_TEST_TOKEN }}
HF_HOME: "cache/models"
HF_DATASETS_CACHE: "cache/datasets"
run: | # PYTHONPATH="${PYTHONPATH}:src" HF_DATASETS_CACHE="cache/datasets" HF_HOME="cache/models"
python -m pytest --disable-pytest-warnings
- name: Write cache
Expand Down
7 changes: 3 additions & 4 deletions docs/source/adding-a-custom-task.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,7 @@ Once your file is created you can then run the evaluation with the following com

```bash
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--tasks "community|{custom_task}|{fewshots}|{truncate_few_shot}" \
--custom_tasks {path_to_your_custom_task_file} \
--output_dir "./evals"
"pretrained=HuggingFaceH4/zephyr-7b-beta" \
"community|{custom_task}|{fewshots}|{truncate_few_shot}" \
--custom-tasks {path_to_your_custom_task_file}
```
8 changes: 7 additions & 1 deletion docs/source/available-tasks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@
You can get a list of all the available tasks by running:

```bash
lighteval tasks --list
lighteval tasks list
```

You can also inspect a specific task by running:

```bash
lighteval tasks inspect <task_name>
```

## List of tasks
Expand Down
23 changes: 19 additions & 4 deletions docs/source/evaluate-the-model-on-a-server-or-container.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@ to the server. The command is the same as before, except you specify a path to
a yaml config file (detailed below):

```bash
lighteval accelerate \
--model_config_path="/path/to/config/file"\
--tasks <task parameters> \
--output_dir output_dir
lighteval endpoint {tgi,inference-endpoint} \
"/path/to/config/file"\
<task parameters>
```

There are two types of configuration files that can be provided for running on
Expand Down Expand Up @@ -65,3 +64,19 @@ model:
inference_server_auth: null
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
```
### OpenAI API
Lighteval also supports evaluating models on the OpenAI API. To do so you need to set your OpenAI API key in the environment variable.
```bash
export OPENAI_API_KEY={your_key}
```

And then run the following command:

```bash
lighteval endpoint openai \
{model-name} \
<task parameters>
```
2 changes: 1 addition & 1 deletion docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ backends—whether it's
[transformers](https://github.com/huggingface/transformers),
[tgi](https://github.com/huggingface/text-generation-inference),
[vllm](https://github.com/vllm-project/vllm), or
[nanotron](https://github.com/huggingface/nanotron)with
[nanotron](https://github.com/huggingface/nanotron)-with
ease. Dive deep into your model’s performance by saving and exploring detailed,
sample-by-sample results to debug and see how your models stack-up.

Expand Down
2 changes: 0 additions & 2 deletions docs/source/package_reference/model_config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,3 @@
[[autodoc]] models.model_config.InferenceModelConfig
[[autodoc]] models.model_config.TGIModelConfig
[[autodoc]] models.model_config.VLLMModelConfig

[[autodoc]] models.model_config.create_model_config
39 changes: 23 additions & 16 deletions docs/source/quicktour.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,24 @@
# Quicktour

We provide two main entry points to evaluate models:

> [!TIP]
> We recommend using the `--help` flag to get more information about the
> available options for each command.
> `lighteval --help`
Lighteval can be used with a few different commands.

- `lighteval accelerate` : evaluate models on CPU or one or more GPUs using [🤗
Accelerate](https://github.com/huggingface/accelerate)
- `lighteval nanotron`: evaluate models in distributed settings using [⚡️
Nanotron](https://github.com/huggingface/nanotron)
- `lighteval vllm`: evaluate models on one or more GPUs using [🚀
VLLM](https://github.com/vllm-project/vllm)
- `lighteval endpoint`
- `inference-endpoint`: evaluate models on one or more GPUs using [🔗
Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated)
- `tgi`: evaluate models on one or more GPUs using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index)
- `openai`: evaluate models on one or more GPUs using [🔗 OpenAI API](https://platform.openai.com/)

## Accelerate

Expand All @@ -15,10 +28,8 @@ To evaluate `GPT-2` on the Truthful QA benchmark, run:

```bash
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
"pretrained=gpt2" \
"leaderboard|truthfulqa:mc|0|0"
```

Here, `--tasks` refers to either a comma-separated list of supported tasks from
Expand Down Expand Up @@ -51,10 +62,8 @@ You can then evaluate a model using data parallelism on 8 GPUs like follows:
```bash
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
"pretrained=gpt2" \
"leaderboard|truthfulqa:mc|0|0"
```
Here, `--override_batch_size` defines the batch size per device, so the effective
Expand All @@ -66,10 +75,8 @@ To evaluate a model using pipeline parallelism on 2 or more GPUs, run:
```bash
lighteval accelerate \
--model_args "pretrained=gpt2,model_parallel=True" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
"pretrained=gpt2,model_parallel=True" \
"leaderboard|truthfulqa:mc|0|0"
```
This will automatically use accelerate to distribute the model across the GPUs.
Expand All @@ -81,7 +88,7 @@ GPUs.
### Model Arguments
The `--model_args` argument takes a string representing a list of model
The `model-args` argument takes a string representing a list of model
argument. The arguments allowed vary depending on the backend you use (vllm or
accelerate).
Expand Down Expand Up @@ -150,8 +157,8 @@ To evaluate a model trained with nanotron on a single gpu.
```bash
torchrun --standalone --nnodes=1 --nproc-per-node=1 \
src/lighteval/__main__.py nanotron \
--checkpoint_config_path ../nanotron/checkpoints/10/config.yaml \
--lighteval_config_path examples/nanotron/lighteval_config_override_template.yaml
--checkpoint-config-path ../nanotron/checkpoints/10/config.yaml \
--lighteval-config-path examples/nanotron/lighteval_config_override_template.yaml
```
The `nproc-per-node` argument should match the data, tensor and pipeline
Expand Down
18 changes: 10 additions & 8 deletions docs/source/saving-and-reading-results.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,32 @@
## Saving results locally

Lighteval will automatically save results and evaluation details in the
directory set with the `--output_dir` argument. The results will be saved in
directory set with the `--output-dir` option. The results will be saved in
`{output_dir}/results/{model_name}/results_{timestamp}.json`. [Here is an
example of a result file](#example-of-a-result-file). The output path can be
any [fsspec](https://filesystem-spec.readthedocs.io/en/latest/index.html)
compliant path (local, s3, hf hub, gdrive, ftp, etc).

To save the details of the evaluation, you can use the `--save_details`
argument. The details will be saved in a parquet file
To save the details of the evaluation, you can use the `--save-details`
option. The details will be saved in a parquet file
`{output_dir}/details/{model_name}/{timestamp}/details_{task}_{timestamp}.parquet`.

## Pushing results to the HuggingFace hub

You can push the results and evaluation details to the HuggingFace hub. To do
so, you need to set the `--push_to_hub` as well as the `--results_org`
argument. The results will be saved in a dataset with the name at
so, you need to set the `--push-to-hub` as well as the `--results-org`
option. The results will be saved in a dataset with the name at
`{results_org}/{model_org}/{model_name}`. To push the details, you need to set
the `--save_details` argument.
the `--save-details` option.
The dataset created will be private by default, you can make it public by
setting the `--public_run` argument.
setting the `--public-run` option.


## Pushing results to Tensorboard

You can push the results to Tensorboard by setting `--push_to_tensorboard`.
You can push the results to Tensorboard by setting `--push-to-tensorboard`.
This will create a Tensorboard dashboard in a HF org set with the `--results-org`
option.


## How to load and investigate details
Expand Down
22 changes: 9 additions & 13 deletions docs/source/use-vllm-as-backend.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ Lighteval allows you to use `vllm` as backend allowing great speedups.
To use, simply change the `model_args` to reflect the arguments you want to pass to vllm.

```bash
lighteval accelerate \
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--output_dir="./evals/"
lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
"leaderboard|truthfulqa:mc|0|0"
```

`vllm` is able to distribute the model across multiple GPUs using data
Expand All @@ -17,19 +16,17 @@ You can choose the parallelism method by setting in the the `model_args`.
For example if you have 4 GPUs you can split it across using `tensor_parallelism`:

```bash
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval accelerate \
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--output_dir="./evals/"
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
"leaderboard|truthfulqa:mc|0|0"
```

Or, if your model fits on a single GPU, you can use `data_parallelism` to speed up the evaluation:

```bash
lighteval accelerate \
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--output_dir="./evals/"
lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
"leaderboard|truthfulqa:mc|0|0"
```

Available arguments for `vllm` can be found in the `VLLMModelConfig`:
Expand All @@ -50,4 +47,3 @@ Available arguments for `vllm` can be found in the `VLLMModelConfig`:
> [!WARNING]
> In the case of OOM issues, you might need to reduce the context size of the
> model as well as reduce the `gpu_memory_utilisation` parameter.
1 change: 0 additions & 1 deletion examples/model_configs/base_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "base" # can be base, tgi, or endpoint
base_params:
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
dtype: "bfloat16"
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/endpoint_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "endpoint" # can be base, tgi, or endpoint
base_params:
endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
model: "meta-llama/Llama-2-7b-hf"
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/peft_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "base"
base_params:
model_args: "pretrained=predibase/customer_support,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ... For a PEFT model, the pretrained model should be the one trained with PEFT and the base model below will contain the original model on which the adapters will be applied.
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/quantized_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "base"
base_params:
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/tgi_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "tgi" # can be base, tgi, or endpoint
instance:
inference_server_address: ""
inference_server_auth: null
Expand Down
4 changes: 1 addition & 3 deletions examples/nanotron/lighteval_config_override_template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ generation: null
logging:
output_dir: "outputs"
save_details: false
push_results_to_hub: false
push_details_to_hub: false
push_results_to_tensorboard: false
push_to_hub: false
public_run: false
results_org: null
tensorboard_metric_prefix: "eval"
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ dependencies = [
"datasets>=2.14.0",
"numpy<2", # pinned to avoid incompatibilities
# Prettiness
"typer",
"termcolor==2.3.0",
"pytablewriter",
"colorama",
Expand Down Expand Up @@ -114,4 +115,4 @@ Issues = "https://github.com/huggingface/lighteval/issues"
# Changelog = "https://github.com/huggingface/lighteval/blob/master/CHANGELOG.md"

[project.scripts]
lighteval = "lighteval.__main__:cli_evaluate"
lighteval = "lighteval.__main__:app"
Loading

0 comments on commit 6e2754e

Please sign in to comment.