New code benchmark #108

adampingel · 2024-08-30T16:15:18Z

@andyjda

andrewdea · 2024-08-31T03:30:38Z

JetBrains-Research has published the benchmark suite Long Code Arena:

The benchmarks are code-related tasks focused on measuring how well models can process large context-windows.
They are different from other popular benchmarks both in how large they allow the context to be, and in how realistic they aim to be: the datasets are based on real-world repos, and the tasks replicate real-world scenarios rather than synthetic "evaluation-focused" use-cases.

It is particular relevant to our case because:

it's a great way to evaluate a model's code-assistant capabilities
the approach to building the benchmark-suite could be expanded into additional tasks and programming languages: keeping the focus on realistic tasks and large-context. This would make the suite itself more useful and help evaluating models across more and more features

adampingel added this to Granite Cookbooks Aug 30, 2024

adampingel converted this from a draft issue Aug 30, 2024

adampingel added the Recipe label Aug 30, 2024

adampingel added this to the 2024-10-21 milestone Aug 30, 2024

adampingel moved this from In Next Milestone to In Current Milestone in Granite Cookbooks Sep 6, 2024

andrewdea self-assigned this Sep 6, 2024

adampingel assigned georgesafta Sep 16, 2024

georgesafta mentioned this issue Oct 27, 2024

Add project level code completion benchmark recipe ibm-granite-community/granite-code-cookbook#147

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New code benchmark #108

New code benchmark #108

adampingel commented Aug 30, 2024 •

edited

Loading

andrewdea commented Aug 31, 2024

New code benchmark #108

New code benchmark #108

Comments

adampingel commented Aug 30, 2024 • edited Loading

andrewdea commented Aug 31, 2024

adampingel commented Aug 30, 2024 •

edited

Loading