-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sotopia Benchmark CLI API #69
Conversation
@ProKil yields error:
Can you check? |
Already fixed @ProKil ! However, It still has this error:
yields error:
|
What is the full back trace? |
@ProKil Aborted. |
…sotopia into feature/benchmark_agents.py
…sotopia into feature/benchmark_agents.py
…on from Hugging Face API
Please also git merge main to resolve the conflicts |
Let's figure all out at first before I merge main, otherwise it's just again and again I doing the repetitive things |
@ProKil Can you try to benchmark a model first? |
@ProKil Not sure why this happens tho |
|
📑 Description
This pull request add a new api that benchmarks a language model using the default LLMAgent class. Here is the desired api we want to achieve:
After calling this cli command, the sotopia benchmark will evaluate the performance of the given model through simulating its interaction with another LLMAgent using the partner model with evaluator model on the given task.
We will also include an bash script which loads all of agent_env_combo from a given subset.
✅ Checks
type/descript
(e.g.feature/add-llm-agents
)ℹ Additional Information