Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ramalama bench #620

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Introduce ramalama bench #620

merged 1 commit into from
Jan 23, 2025

Conversation

ericcurtin
Copy link
Collaborator

@ericcurtin ericcurtin commented Jan 23, 2025

Allows benchmarking of models, GPU stacks, etc.

Copy link
Contributor

sourcery-ai bot commented Jan 23, 2025

Reviewer's Guide by Sourcery

This pull request introduces a new 'bench' subcommand to the CLI, enabling users to benchmark specified AI models. It also refactors the gpu_args method to handle different argument styles for runner and server modes, and adds a new build_exec_args_bench method.

Sequence diagram for the new benchmark command flow

sequenceDiagram
    actor User
    participant CLI
    participant Model
    participant Container

    User->>CLI: ramalama bench MODEL
    CLI->>Model: bench(args)
    Model->>Model: check_name_and_container()
    Model->>Model: get_model_path()
    Model->>Model: build_exec_args_bench()
    Model->>Model: gpu_args()
    Model->>Container: execute_model()
    Container-->>User: benchmark results
Loading

Class diagram showing the updated Model class

classDiagram
    class Model {
        +bench(args)
        +gpu_args(force: bool, runner: bool)
        -build_exec_args_bench(args, model_path)
        +run(args)
        +execute_model(model_path, exec_args, args)
    }
    note for Model "Added bench method and
modified gpu_args to support
runner/server modes"
Loading

Flow diagram for benchmark command execution

graph TD
    A[Start Benchmark] --> B[Parse CLI Arguments]
    B --> C[Initialize Model]
    C --> D[Check Container & Name]
    D --> E[Get Model Path]
    E --> F[Build Benchmark Args]
    F --> G[Check GPU]
    G --> H[Add GPU Args if needed]
    H --> I[Execute Model]
    I --> J[Return Results]
Loading

File-Level Changes

Change Details Files
Added a new 'bench' subcommand to the CLI.
  • Added a bench_cli function to handle the bench subcommand.
  • Added a bench_parser function to configure the bench subcommand.
  • Added a new 'bench' subcommand to the subparsers.
ramalama/cli.py
Refactored the gpu_args method to handle different argument styles for runner and server modes.
  • Modified the gpu_args method to accept a 'runner' parameter instead of 'server'.
  • Changed the gpu_args method to use '--ngl' for runner mode and '-ngl' for other modes.
ramalama/model.py
Added a new build_exec_args_bench method.
  • Added a new build_exec_args_bench method to build the execution arguments for the bench subcommand.
ramalama/model.py
Added a bench method to execute the model in benchmark mode.
  • Added a bench method to execute the model in benchmark mode.
ramalama/model.py
Modified the gpu_args call in build_exec_args_run.
  • Modified the gpu_args call in build_exec_args_run to pass runner=True.
ramalama/model.py
Modified the gpu_args call in handle_runtime.
  • Modified the gpu_args call in handle_runtime to not pass server=True.
ramalama/model.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ericcurtin - I've reviewed your changes - here's some feedback:

Overall Comments:

  • The PR is marked as WIP - could you clarify what work is still pending?
  • The help text for the ARGS parameter in bench_parser appears to be copied from another command and doesn't describe benchmark-specific arguments
  • There's inconsistent handling of single vs double dashes in gpu_args() which could lead to confusion - consider standardizing on one format
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

ramalama/model.py Outdated Show resolved Hide resolved
ramalama/model.py Show resolved Hide resolved
ramalama/model.py Show resolved Hide resolved
ramalama/model.py Outdated Show resolved Hide resolved
ramalama/model.py Outdated Show resolved Hide resolved
@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 23, 2025

@slp you might find this useful when it gets merged:

$ ramalama --container bench smollm:135m
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Kompute    | 999 |         pp512 |       1407.02 ± 3.41 |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Kompute    | 999 |         tg128 |       133.25 ± 61.76 |

build: f8feb4b0 (4453)
$ ramalama --nocontainer bench smollm:135m
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Metal,BLAS |       6 |         pp512 |    11755.20 ± 190.70 |
| llama ?B Q4_0                  |  85.77 MiB |   134.52 M | Metal,BLAS |       6 |         tg128 |        301.65 ± 0.71 |
build: f8feb4b0 (4453)

@ericcurtin ericcurtin changed the title [WIP] ramalama bench Introduce ramalama bench Jan 23, 2025
@ericcurtin
Copy link
Collaborator Author

#617

@ericcurtin ericcurtin force-pushed the ramalama-bench branch 3 times, most recently from 5516278 to a5e4931 Compare January 23, 2025 13:09
if server:
gpu_args += ["-ngl"] # single dash
if runner:
gpu_args += ["--ngl"] # single dash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are backwards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's deliberate run is "--ngl" and everything else is "-ngl"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes comments, fixing

@rhatdan
Copy link
Member

rhatdan commented Jan 23, 2025

I take it bench is a command available for llama.cpp? We'll have to do something for vllm or just use llama.cpp bench command.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Jan 23, 2025

I take it bench is a command available for llama.cpp? We'll have to do something for vllm or just use llama.cpp bench command.

Yes, it's built in to llama.cpp . A benchmarking functionality was requested numerous times. I don't know how to do it vllm yet. But llama.cpp have a simple tool in. I agree we will need a vllm equivalent eventually.

Allows benchmarking of models, GPU stacks, etc.

Signed-off-by: Eric Curtin <[email protected]>
@rhatdan rhatdan merged commit 0befa5c into main Jan 23, 2025
11 checks passed
@ericcurtin ericcurtin deleted the ramalama-bench branch January 23, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants