-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce ramalama bench #620
Conversation
Reviewer's Guide by SourceryThis pull request introduces a new 'bench' subcommand to the CLI, enabling users to benchmark specified AI models. It also refactors the gpu_args method to handle different argument styles for runner and server modes, and adds a new build_exec_args_bench method. Sequence diagram for the new benchmark command flowsequenceDiagram
actor User
participant CLI
participant Model
participant Container
User->>CLI: ramalama bench MODEL
CLI->>Model: bench(args)
Model->>Model: check_name_and_container()
Model->>Model: get_model_path()
Model->>Model: build_exec_args_bench()
Model->>Model: gpu_args()
Model->>Container: execute_model()
Container-->>User: benchmark results
Class diagram showing the updated Model classclassDiagram
class Model {
+bench(args)
+gpu_args(force: bool, runner: bool)
-build_exec_args_bench(args, model_path)
+run(args)
+execute_model(model_path, exec_args, args)
}
note for Model "Added bench method and
modified gpu_args to support
runner/server modes"
Flow diagram for benchmark command executiongraph TD
A[Start Benchmark] --> B[Parse CLI Arguments]
B --> C[Initialize Model]
C --> D[Check Container & Name]
D --> E[Get Model Path]
E --> F[Build Benchmark Args]
F --> G[Check GPU]
G --> H[Add GPU Args if needed]
H --> I[Execute Model]
I --> J[Return Results]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ericcurtin - I've reviewed your changes - here's some feedback:
Overall Comments:
- The PR is marked as WIP - could you clarify what work is still pending?
- The help text for the ARGS parameter in bench_parser appears to be copied from another command and doesn't describe benchmark-specific arguments
- There's inconsistent handling of single vs double dashes in gpu_args() which could lead to confusion - consider standardizing on one format
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
f4684fc
to
e4e2b82
Compare
@slp you might find this useful when it gets merged:
|
4d00b76
to
0d79a77
Compare
5516278
to
a5e4931
Compare
ramalama/model.py
Outdated
if server: | ||
gpu_args += ["-ngl"] # single dash | ||
if runner: | ||
gpu_args += ["--ngl"] # single dash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's deliberate run is "--ngl" and everything else is "-ngl"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes comments, fixing
I take it bench is a command available for llama.cpp? We'll have to do something for vllm or just use llama.cpp bench command. |
Yes, it's built in to llama.cpp . A benchmarking functionality was requested numerous times. I don't know how to do it vllm yet. But llama.cpp have a simple tool in. I agree we will need a vllm equivalent eventually. |
Allows benchmarking of models, GPU stacks, etc. Signed-off-by: Eric Curtin <[email protected]>
a5e4931
to
d31b8bf
Compare
Allows benchmarking of models, GPU stacks, etc.