Introduce ramalama bench #620

ericcurtin · 2025-01-23T12:41:11Z

Allows benchmarking of models, GPU stacks, etc.

sourcery-ai · 2025-01-23T12:41:16Z

Reviewer's Guide by Sourcery

This pull request introduces a new 'bench' subcommand to the CLI, enabling users to benchmark specified AI models. It also refactors the gpu_args method to handle different argument styles for runner and server modes, and adds a new build_exec_args_bench method.

Sequence diagram for the new benchmark command flow

sequenceDiagram
    actor User
    participant CLI
    participant Model
    participant Container

    User->>CLI: ramalama bench MODEL
    CLI->>Model: bench(args)
    Model->>Model: check_name_and_container()
    Model->>Model: get_model_path()
    Model->>Model: build_exec_args_bench()
    Model->>Model: gpu_args()
    Model->>Container: execute_model()
    Container-->>User: benchmark results

Class diagram showing the updated Model class

classDiagram
    class Model {
        +bench(args)
        +gpu_args(force: bool, runner: bool)
        -build_exec_args_bench(args, model_path)
        +run(args)
        +execute_model(model_path, exec_args, args)
    }
    note for Model "Added bench method and
modified gpu_args to support
runner/server modes"

Flow diagram for benchmark command execution

graph TD
    A[Start Benchmark] --> B[Parse CLI Arguments]
    B --> C[Initialize Model]
    C --> D[Check Container & Name]
    D --> E[Get Model Path]
    E --> F[Build Benchmark Args]
    F --> G[Check GPU]
    G --> H[Add GPU Args if needed]
    H --> I[Execute Model]
    I --> J[Return Results]

File-Level Changes

Change	Details	Files
Added a new 'bench' subcommand to the CLI.	Added a bench_cli function to handle the bench subcommand. Added a bench_parser function to configure the bench subcommand. Added a new 'bench' subcommand to the subparsers.	`ramalama/cli.py`
Refactored the gpu_args method to handle different argument styles for runner and server modes.	Modified the gpu_args method to accept a 'runner' parameter instead of 'server'. Changed the gpu_args method to use '--ngl' for runner mode and '-ngl' for other modes.	`ramalama/model.py`
Added a new build_exec_args_bench method.	Added a new build_exec_args_bench method to build the execution arguments for the bench subcommand.	`ramalama/model.py`
Added a bench method to execute the model in benchmark mode.	Added a bench method to execute the model in benchmark mode.	`ramalama/model.py`
Modified the gpu_args call in build_exec_args_run.	Modified the gpu_args call in build_exec_args_run to pass runner=True.	`ramalama/model.py`
Modified the gpu_args call in handle_runtime.	Modified the gpu_args call in handle_runtime to not pass server=True.	`ramalama/model.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @ericcurtin - I've reviewed your changes - here's some feedback:

Overall Comments:

The PR is marked as WIP - could you clarify what work is still pending?
The help text for the ARGS parameter in bench_parser appears to be copied from another command and doesn't describe benchmark-specific arguments
There's inconsistent handling of single vs double dashes in gpu_args() which could lead to confusion - consider standardizing on one format

Here's what I looked at during the review

🟡 General issues: 2 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

ramalama/model.py

ericcurtin · 2025-01-23T12:43:45Z

@slp you might find this useful when it gets merged:

$ ramalama --container bench smollm:135m
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Kompute    | 999 |         pp512 |       1407.02 ± 3.41 |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Kompute    | 999 |         tg128 |       133.25 ± 61.76 |

build: f8feb4b0 (4453)
$ ramalama --nocontainer bench smollm:135m
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Metal,BLAS |       6 |         pp512 |    11755.20 ± 190.70 |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Metal,BLAS |       6 |         tg128 |        301.65 ± 0.71 |
build: f8feb4b0 (4453)

ericcurtin · 2025-01-23T12:47:34Z

#617

rhatdan · 2025-01-23T14:03:29Z

ramalama/model.py

-            if server:
-                gpu_args += ["-ngl"]  # single dash
+            if runner:
+                gpu_args += ["--ngl"]  # single dash


Comments are backwards.

It's deliberate run is "--ngl" and everything else is "-ngl"

Oh yes comments, fixing

rhatdan · 2025-01-23T14:05:17Z

I take it bench is a command available for llama.cpp? We'll have to do something for vllm or just use llama.cpp bench command.

ericcurtin · 2025-01-23T15:04:37Z

I take it bench is a command available for llama.cpp? We'll have to do something for vllm or just use llama.cpp bench command.

Yes, it's built in to llama.cpp . A benchmarking functionality was requested numerous times. I don't know how to do it vllm yet. But llama.cpp have a simple tool in. I agree we will need a vllm equivalent eventually.

Allows benchmarking of models, GPU stacks, etc. Signed-off-by: Eric Curtin <[email protected]>

sourcery-ai bot reviewed Jan 23, 2025

View reviewed changes

ramalama/model.py Outdated Show resolved Hide resolved

ramalama/model.py Show resolved Hide resolved

ramalama/model.py Show resolved Hide resolved

ramalama/model.py Outdated Show resolved Hide resolved

ramalama/model.py Outdated Show resolved Hide resolved

ericcurtin force-pushed the ramalama-bench branch from f4684fc to e4e2b82 Compare January 23, 2025 12:42

ericcurtin force-pushed the ramalama-bench branch from 4d00b76 to 0d79a77 Compare January 23, 2025 12:46

ericcurtin changed the title ~~[WIP] ramalama bench~~ Introduce ramalama bench Jan 23, 2025

ericcurtin force-pushed the ramalama-bench branch 3 times, most recently from 5516278 to a5e4931 Compare January 23, 2025 13:09

rhatdan reviewed Jan 23, 2025

View reviewed changes

Introduce ramalama bench

d31b8bf

Allows benchmarking of models, GPU stacks, etc. Signed-off-by: Eric Curtin <[email protected]>

ericcurtin force-pushed the ramalama-bench branch from a5e4931 to d31b8bf Compare January 23, 2025 15:19

rhatdan merged commit 0befa5c into main Jan 23, 2025
11 checks passed

ericcurtin deleted the ramalama-bench branch January 23, 2025 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce ramalama bench #620

Introduce ramalama bench #620

ericcurtin commented Jan 23, 2025 •

edited

Loading

sourcery-ai bot commented Jan 23, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

ericcurtin commented Jan 23, 2025 •

edited

Loading

ericcurtin commented Jan 23, 2025

rhatdan Jan 23, 2025

ericcurtin Jan 23, 2025

ericcurtin Jan 23, 2025

rhatdan commented Jan 23, 2025

ericcurtin commented Jan 23, 2025 •

edited

Loading

Introduce ramalama bench #620

Introduce ramalama bench #620

Conversation

ericcurtin commented Jan 23, 2025 • edited Loading

sourcery-ai bot commented Jan 23, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for the new benchmark command flow

Class diagram showing the updated Model class

Flow diagram for benchmark command execution

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

ericcurtin commented Jan 23, 2025 • edited Loading

ericcurtin commented Jan 23, 2025

rhatdan Jan 23, 2025

Choose a reason for hiding this comment

ericcurtin Jan 23, 2025

Choose a reason for hiding this comment

ericcurtin Jan 23, 2025

Choose a reason for hiding this comment

rhatdan commented Jan 23, 2025

ericcurtin commented Jan 23, 2025 • edited Loading

ericcurtin commented Jan 23, 2025 •

edited

Loading

sourcery-ai bot commented Jan 23, 2025 •

edited

Loading

ericcurtin commented Jan 23, 2025 •

edited

Loading

ericcurtin commented Jan 23, 2025 •

edited

Loading