Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sotopia Benchmark CLI API #69

Merged
merged 48 commits into from
Jun 17, 2024
Merged

Sotopia Benchmark CLI API #69

merged 48 commits into from
Jun 17, 2024

Conversation

XuhuiZhou
Copy link
Member

@XuhuiZhou XuhuiZhou commented May 16, 2024

📑 Description

This pull request add a new api that benchmarks a language model using the default LLMAgent class. Here is the desired api we want to achieve:

sotopia_benchmark \
--model <model_name> \
--partner-model <partner_model_name> \
--evaluator-model <evaluator_model_name> \
--task <agent_env_combo_id>

After calling this cli command, the sotopia benchmark will evaluate the performance of the given model through simulating its interaction with another LLMAgent using the partner model with evaluator model on the given task.

We will also include an bash script which loads all of agent_env_combo from a given subset.

✅ Checks

  • My pull request adheres to the code style of this project
  • My code requires changes to the documentation
  • I have updated the documentation as required
  • All the tests have passed
  • Branch name follows type/descript (e.g. feature/add-llm-agents)
  • Ready for code review

ℹ Additional Information

@ProKil ProKil changed the title Feature/benchmark_agents.py Sotopia Benchmark CLI API May 16, 2024
@ProKil ProKil mentioned this pull request May 19, 2024
15 tasks
@ProKil ProKil added this to the 0.1.0 Release milestone May 19, 2024
@XuhuiZhou XuhuiZhou requested a review from ProKil May 26, 2024 16:48
docs/examples.md Outdated Show resolved Hide resolved
@XuhuiZhou
Copy link
Member Author

XuhuiZhou commented May 26, 2024

@ProKil
python sotopia/benchmark/cli.py --model=gpt-4o

yields error:

RuntimeError: Type not yet supported: typing.Literal['togethercomputer/llama-2-7b-chat', 'togethercomputer/llama-2-70b-chat', 
'togethercomputer/mpt-30b-chat', 'gpt-3.5-turbo', 'gpt-3.5-turbo-finetuned', 'gpt-3.5-turbo-ft-MF', 'text-davinci-003', 'gpt-4', 
'gpt-4-turbo', 'human', 'redis', 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'together_ai/togethercomputer/llama-2-7b-chat', 
'together_ai/togethercomputer/falcon-7b-instruct', 'meta-llama/Llama-3-8b-chat-hf', 'meta-llama/Llama-3-70b-chat-hf', 
'groq/llama3-70b-8192']

Can you check?

@XuhuiZhou
Copy link
Member Author

XuhuiZhou commented May 26, 2024

Already fixed @ProKil !

However, It still has this error:

python sotopia/benchmark/cli.py --model=gpt-4o --batch-size=1

yields error:

  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch:  12%|███████▊                                                         | 24/200 [14:14<1:44:28, 35.62s/it]

@ProKil
Copy link
Member

ProKil commented May 26, 2024

Already fixed @ProKil !

However, It still has this error:

python sotopia/benchmark/cli.py --model=gpt-4o --batch-size=1

yields error:

  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch:  12%|███████▊                                                         | 24/200 [14:14<1:44:28, 35.62s/it]

What is the full back trace?

@XuhuiZhou
Copy link
Member Author

XuhuiZhou commented May 26, 2024

@ProKil
RROR:asyncio:Task exception was never retrieved | 0/1 [00:00<?, ?it/s]
future: <Task finished name='Task-5021' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose
await self._transport.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose
await self._pool.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose
await self._close_connections(closing_connections)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose
await self.transport_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
self._transport.close()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
self._check_closed()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-5022' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose
await self._transport.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose
await self._pool.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose
await self._close_connections(closing_connections)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose
await self.transport_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
self._transport.close()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
self._check_closed()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-5023' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')>
Traceback (most recent call last):
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose
await self._transport.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose
await self._pool.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose
await self._close_connections(closing_connections)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections
await connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose
await self._connection.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose
await self._network_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose
await self._stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose
await self.transport_stream.aclose()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
self._transport.close()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
self._loop.call_soon(self._call_connection_lost, None)
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
self._check_closed()
File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch: 12%|███████▊ | 24/200 [14:14<1:44:28, 35.62s/it]

Aborted.
Running one batch: 0%| | 0/1 [00:21<?, ?it/s]

@XuhuiZhou XuhuiZhou requested a review from ProKil June 5, 2024 22:14
docs/pages/benchmark.md Outdated Show resolved Hide resolved
docs/pages/examples.mdx Outdated Show resolved Hide resolved
notebooks/redis_stats.ipynb Outdated Show resolved Hide resolved
sotopia-chat/chat_server.py Show resolved Hide resolved
ProKil added a commit that referenced this pull request Jun 6, 2024
@XuhuiZhou XuhuiZhou requested a review from ProKil June 14, 2024 22:30
sotopia/benchmark/cli.py Show resolved Hide resolved
sotopia/benchmark/cli.py Outdated Show resolved Hide resolved
@ProKil
Copy link
Member

ProKil commented Jun 15, 2024

Please also git merge main to resolve the conflicts

@XuhuiZhou
Copy link
Member Author

Please also git merge main to resolve the conflicts

Let's figure all out at first before I merge main, otherwise it's just again and again I doing the repetitive things

@XuhuiZhou
Copy link
Member Author

@ProKil Can you try to benchmark a model first?

@XuhuiZhou
Copy link
Member Author

image @ProKil Not sure why this happens tho

@ProKil
Copy link
Member

ProKil commented Jun 15, 2024

mypy --install-types .

@XuhuiZhou XuhuiZhou requested a review from ProKil June 17, 2024 02:39
@ProKil ProKil merged commit 8bff863 into main Jun 17, 2024
5 checks passed
@ProKil ProKil deleted the feature/benchmark_agents.py branch June 17, 2024 15:43
ProKil pushed a commit that referenced this pull request Jul 9, 2024
…V0.2 (#132)

* cherry picked generate.py from #69

* Add AzureOpenAI for agent and env models and Update to langchain V0.2 runnable interface

* Delete azure api key input

* Fix mypy errors

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants