Sotopia Benchmark CLI API #69

XuhuiZhou · 2024-05-16T01:19:03Z

📑 Description

This pull request add a new api that benchmarks a language model using the default LLMAgent class. Here is the desired api we want to achieve:

sotopia_benchmark \
--model <model_name> \
--partner-model <partner_model_name> \
--evaluator-model <evaluator_model_name> \
--task <agent_env_combo_id>

After calling this cli command, the sotopia benchmark will evaluate the performance of the given model through simulating its interaction with another LLMAgent using the partner model with evaluator model on the given task.

We will also include an bash script which loads all of agent_env_combo from a given subset.

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/descript (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

docs/examples.md

XuhuiZhou · 2024-05-26T17:52:18Z

@ProKil
python sotopia/benchmark/cli.py --model=gpt-4o

yields error:

RuntimeError: Type not yet supported: typing.Literal['togethercomputer/llama-2-7b-chat', 'togethercomputer/llama-2-70b-chat', 
'togethercomputer/mpt-30b-chat', 'gpt-3.5-turbo', 'gpt-3.5-turbo-finetuned', 'gpt-3.5-turbo-ft-MF', 'text-davinci-003', 'gpt-4', 
'gpt-4-turbo', 'human', 'redis', 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'together_ai/togethercomputer/llama-2-7b-chat', 
'together_ai/togethercomputer/falcon-7b-instruct', 'meta-llama/Llama-3-8b-chat-hf', 'meta-llama/Llama-3-70b-chat-hf', 
'groq/llama3-70b-8192']

Can you check?

XuhuiZhou · 2024-05-26T18:48:50Z

Already fixed @ProKil !

However, It still has this error:

python sotopia/benchmark/cli.py --model=gpt-4o --batch-size=1

yields error:

  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch:  12%|███████▊                                                         | 24/200 [14:14<1:44:28, 35.62s/it]

ProKil · 2024-05-26T19:19:16Z

Already fixed @ProKil !

However, It still has this error:

python sotopia/benchmark/cli.py --model=gpt-4o --batch-size=1

yields error:

  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch:  12%|███████▊                                                         | 24/200 [14:14<1:44:28, 35.62s/it]

What is the full back trace?

XuhuiZhou · 2024-05-26T19:36:29Z

@ProKil
RROR:asyncio:Task exception was never retrieved | 0/1 [00:00<?, ?it/s]
future: <Task finished name='Task-5021' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose
await self._transport.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose
await self._pool.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose
await self._close_connections(closing_connections)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose
await self.transport_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
self._transport.close()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
self._check_closed()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-5022' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose
await self._transport.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose
await self._pool.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose
await self._close_connections(closing_connections)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose
await self.transport_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
self._transport.close()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
self._check_closed()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-5023' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose
await self._transport.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose
await self._pool.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose
await self._close_connections(closing_connections)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose
await self.transport_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
self._transport.close()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
self._check_closed()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch: 12%|███████▊ | 24/200 [14:14<1:44:28, 35.62s/it]

Aborted.
Running one batch: 0%| | 0/1 [00:21<?, ?it/s]

…sotopia into feature/benchmark_agents.py

…nts.py

docs/pages/benchmark.md

docs/pages/examples.mdx

notebooks/redis_stats.ipynb

sotopia-chat/chat_server.py

…nts.py

…sotopia into feature/benchmark_agents.py

merge

…on from Hugging Face API

sotopia/benchmark/cli.py

ProKil · 2024-06-15T19:36:36Z

Please also git merge main to resolve the conflicts

XuhuiZhou · 2024-06-15T19:52:57Z

Please also git merge main to resolve the conflicts

Let's figure all out at first before I merge main, otherwise it's just again and again I doing the repetitive things

XuhuiZhou · 2024-06-15T20:18:16Z

@ProKil Can you try to benchmark a model first?

XuhuiZhou · 2024-06-15T20:23:56Z

@ProKil Not sure why this happens tho

ProKil · 2024-06-15T23:52:19Z

mypy --install-types .

…V0.2 (#132) * cherry picked generate.py from #69 * Add AzureOpenAI for agent and env models and Update to langchain V0.2 runnable interface * Delete azure api key input * Fix mypy errors ---------

XuhuiZhou and others added 4 commits May 15, 2024 20:13

add benchmark social agents

d0811b6

add benchmark agents

51a9132

Add sotopia_benchmark cli api

b576be7

fix pre-commit

7922e16

ProKil changed the title ~~Feature/benchmark_agents.py~~ Sotopia Benchmark CLI API May 16, 2024

add evaluator model argument

9c13c02

ProKil mentioned this pull request May 19, 2024

Roadmap to Sotopia v0.1 #70

Closed

15 tasks

ProKil added this to the 0.1.0 Release milestone May 19, 2024

XuhuiZhou added 5 commits May 22, 2024 20:16

Merge branch 'main' into feature/benchmark_agents.py

165c245

finish benchmarking

27aa565

benchmark done

b625cfc

chore: Fix formatting issue in redis_stats.ipynb and cli.py

7066a2a

switch back to LLM_Name

b710ea6

XuhuiZhou requested a review from ProKil May 26, 2024 16:48

ProKil requested changes May 26, 2024

View reviewed changes

docs/examples.md Outdated Show resolved Hide resolved

XuhuiZhou added 2 commits May 26, 2024 13:05

Merge branch 'main' into feature/benchmark_agents.py

385a53c

merge main

ad9e741

XuhuiZhou added 2 commits May 26, 2024 13:52

add together ai

ee87f84

fix naming error

41ad0e3

XuhuiZhou and others added 7 commits May 26, 2024 16:54

roll back to llama2

5ae2b26

chore: Update langchain-together dependency to version 0.1.2

af4e2ee

use chatopenai for together models

5361d0c

Merge branch 'feature/benchmark_agents.py' of github.com:sotopia-lab/…

35be2a2

…sotopia into feature/benchmark_agents.py

add logging

7087f40

Merge remote-tracking branch 'origin/main' into feature/benchmark_age…

01b6291

…nts.py

fix pre-commit

4be892d

XuhuiZhou added 3 commits June 5, 2024 15:00

Refactor run_async_benchmark_in_batch function

bfadacd

add doc

9a5dce9

precommit fix

3dbde75

XuhuiZhou requested a review from ProKil June 5, 2024 22:14

pre-commit

897389a

ProKil requested changes Jun 6, 2024

View reviewed changes

docs/pages/benchmark.md Outdated Show resolved Hide resolved

docs/pages/examples.mdx Outdated Show resolved Hide resolved

notebooks/redis_stats.ipynb Outdated Show resolved Hide resolved

sotopia-chat/chat_server.py Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into feature/benchmark_age…

1c6b2b9

…nts.py

ProKil added a commit that referenced this pull request Jun 6, 2024

cherry picked generate.py from #69

c810b1d

XuhuiZhou added 6 commits June 9, 2024 16:45

refactor

b7fd148

Merge branch 'feature/benchmark_agents.py' of github.com:sotopia-lab/…

73c2c92

…sotopia into feature/benchmark_agents.py

Merge branch 'main' into feature/benchmark_agents.py

df1bfde

merge

update w feedback

8352ea7

pre commit

f6c86cb

chore: Update authors in pyproject.toml and fetch benchmark_agents.js…

6bcb178

…on from Hugging Face API

XuhuiZhou requested a review from ProKil June 14, 2024 22:30

ProKil requested changes Jun 15, 2024

View reviewed changes

sotopia/benchmark/cli.py Show resolved Hide resolved

sotopia/benchmark/cli.py Outdated Show resolved Hide resolved

hotfix

f5c269c

XuhuiZhou added 2 commits June 15, 2024 13:21

Merge branch 'main' into feature/benchmark_agents.py

98fe351

chore: Remove unnecessary type hint in benchmark/cli.py

7165109

XuhuiZhou requested a review from ProKil June 17, 2024 02:39

ProKil approved these changes Jun 17, 2024

View reviewed changes

ProKil merged commit 8bff863 into main Jun 17, 2024
5 checks passed

ProKil deleted the feature/benchmark_agents.py branch June 17, 2024 15:43

ProKil mentioned this pull request Jun 19, 2024

[BUG]: Reinitialized agents #84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sotopia Benchmark CLI API #69

Sotopia Benchmark CLI API #69

XuhuiZhou commented May 16, 2024 •

edited by ProKil

Loading

XuhuiZhou commented May 26, 2024 •

edited

Loading

XuhuiZhou commented May 26, 2024 •

edited

Loading

ProKil commented May 26, 2024

XuhuiZhou commented May 26, 2024 •

edited

Loading

ProKil commented Jun 15, 2024

XuhuiZhou commented Jun 15, 2024

XuhuiZhou commented Jun 15, 2024

XuhuiZhou commented Jun 15, 2024

ProKil commented Jun 15, 2024

Sotopia Benchmark CLI API #69

Sotopia Benchmark CLI API #69

Conversation

XuhuiZhou commented May 16, 2024 • edited by ProKil Loading

📑 Description

✅ Checks

ℹ Additional Information

XuhuiZhou commented May 26, 2024 • edited Loading

XuhuiZhou commented May 26, 2024 • edited Loading

ProKil commented May 26, 2024

XuhuiZhou commented May 26, 2024 • edited Loading

ProKil commented Jun 15, 2024

XuhuiZhou commented Jun 15, 2024

XuhuiZhou commented Jun 15, 2024

XuhuiZhou commented Jun 15, 2024

ProKil commented Jun 15, 2024

XuhuiZhou commented May 16, 2024 •

edited by ProKil

Loading

XuhuiZhou commented May 26, 2024 •

edited

Loading

XuhuiZhou commented May 26, 2024 •

edited

Loading

XuhuiZhou commented May 26, 2024 •

edited

Loading