Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New code benchmark #108

Open
adampingel opened this issue Aug 30, 2024 · 1 comment
Open

New code benchmark #108

adampingel opened this issue Aug 30, 2024 · 1 comment
Assignees
Labels
Milestone

Comments

@adampingel
Copy link
Contributor

adampingel commented Aug 30, 2024

@andyjda

@adampingel adampingel converted this from a draft issue Aug 30, 2024
@adampingel adampingel added this to the 2024-10-21 milestone Aug 30, 2024
@andrewdea
Copy link

JetBrains-Research has published the benchmark suite Long Code Arena:

The benchmarks are code-related tasks focused on measuring how well models can process large context-windows.
They are different from other popular benchmarks both in how large they allow the context to be, and in how realistic they aim to be: the datasets are based on real-world repos, and the tasks replicate real-world scenarios rather than synthetic "evaluation-focused" use-cases.

It is particular relevant to our case because:

  • it's a great way to evaluate a model's code-assistant capabilities
  • the approach to building the benchmark-suite could be expanded into additional tasks and programming languages: keeping the focus on realistic tasks and large-context. This would make the suite itself more useful and help evaluating models across more and more features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

3 participants