Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2.0 #285

Merged
merged 119 commits into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
808453c
init v0.1.8
huangshiyu13 Oct 13, 2023
3e437df
init v0.1.8
huangshiyu13 Oct 13, 2023
75bf531
update test
huangshiyu13 Oct 17, 2023
7e5b043
update test
huangshiyu13 Oct 17, 2023
06453de
update test
huangshiyu13 Oct 17, 2023
9daedf1
update test
huangshiyu13 Oct 17, 2023
d194860
update test
huangshiyu13 Oct 17, 2023
93ce6f1
update test
huangshiyu13 Oct 17, 2023
e8f2011
Merge pull request #243 from huangshiyu13/main
huangshiyu13 Oct 17, 2023
8e17ae1
fix winning rate output bug of smac
huangshiyu13 Oct 18, 2023
7ddc3ab
fix winning rate output bug of smac
huangshiyu13 Oct 18, 2023
4703fc9
support deepspeed
WentseChen Oct 19, 2023
303e65d
Merge pull request #246 from WentseChen/1018_ds
huangshiyu13 Oct 19, 2023
10baeac
- update test
huangshiyu13 Oct 19, 2023
73c8f4f
Merge pull request #247 from huangshiyu13/main
huangshiyu13 Oct 19, 2023
6665505
fix atari training bugs
huangshiyu13 Oct 19, 2023
782cb07
fix atari training bugs
huangshiyu13 Oct 19, 2023
601c795
fix windows bugs
huangshiyu13 Oct 19, 2023
254cf26
update
huangshiyu13 Oct 19, 2023
948238d
fix windows bugs
huangshiyu13 Oct 19, 2023
4f110d6
update
huangshiyu13 Oct 19, 2023
d32f067
fix python3.11 test
huangshiyu13 Oct 19, 2023
1ac9038
Merge pull request #251 from huangshiyu13/main
huangshiyu13 Oct 19, 2023
462dbd7
format
huangshiyu13 Oct 19, 2023
85e7803
Merge pull request #252 from huangshiyu13/main
huangshiyu13 Oct 19, 2023
ca32597
add test atari
huangshiyu13 Oct 20, 2023
b735781
Merge pull request #253 from huangshiyu13/main
huangshiyu13 Oct 20, 2023
eeb026f
add test gpt work
huangshiyu13 Oct 20, 2023
090b617
delete common.py
huangshiyu13 Oct 20, 2023
0092356
Merge pull request #254 from huangshiyu13/main
huangshiyu13 Oct 20, 2023
05e7d8b
init v0.1.9
huangshiyu13 Oct 20, 2023
a18ec33
init v0.1.9
huangshiyu13 Oct 20, 2023
c7f8f3a
fix typo: change loss rate to lose rate
huangshiyu13 Oct 23, 2023
f96e77f
Merge pull request #256 from huangshiyu13/main
huangshiyu13 Oct 23, 2023
5aa7d9e
update _worker test
huangshiyu13 Oct 23, 2023
d2ff6b3
update _worker test
huangshiyu13 Oct 23, 2023
8889302
update _worker test
huangshiyu13 Oct 23, 2023
7e70569
Merge pull request #257 from huangshiyu13/main
huangshiyu13 Oct 23, 2023
0228e51
fix arena petting zoo import error
huangshiyu13 Oct 23, 2023
962d45a
Merge pull request #258 from huangshiyu13/main
huangshiyu13 Oct 23, 2023
537f822
arena add test more envs
huangshiyu13 Oct 24, 2023
73c8108
arena add test more envs
huangshiyu13 Oct 24, 2023
5ec7e29
update
huangshiyu13 Oct 24, 2023
638645a
Merge pull request #260 from huangshiyu13/main
huangshiyu13 Oct 24, 2023
b41fbc8
update
huangshiyu13 Oct 24, 2023
986bd77
Merge pull request #261 from huangshiyu13/main
huangshiyu13 Oct 24, 2023
a2df15d
add RandomAgent for Arena
huangshiyu13 Oct 25, 2023
847dc91
Merge pull request #262 from huangshiyu13/main
huangshiyu13 Oct 25, 2023
f6c6080
add test attention.py
huangshiyu13 Oct 25, 2023
23cfd38
Merge pull request #263 from huangshiyu13/main
huangshiyu13 Oct 25, 2023
f3667ab
update test
huangshiyu13 Oct 26, 2023
0707ba6
update test
huangshiyu13 Oct 26, 2023
e5305cc
Merge pull request #264 from huangshiyu13/main
huangshiyu13 Oct 26, 2023
4da3894
init v0.1.10
huangshiyu13 Oct 27, 2023
18674a5
init v0.1.10
huangshiyu13 Oct 27, 2023
11af42a
fix eval_callback: need to reset rnn state when environment is done
huangshiyu13 Nov 1, 2023
d034ce1
fix eval_callback: need to reset rnn state when environment is done
huangshiyu13 Nov 1, 2023
6470b87
fix eval_callback bug
huangshiyu13 Nov 1, 2023
6e1f5e8
update_ds_config
WentseChen Nov 4, 2023
bd88dcb
update ds_config
WentseChen Nov 4, 2023
ed0ceb7
update config
WentseChen Nov 5, 2023
d7c974e
make format
WentseChen Nov 7, 2023
b6e78a2
fix assersion bug
WentseChen Nov 8, 2023
b44efad
Merge pull request #270 from WentseChen/1102_ds
huangshiyu13 Nov 9, 2023
1edd6c6
fix rock paper scissors
huangshiyu13 Nov 10, 2023
3e0d644
fix rock paper scissors
huangshiyu13 Nov 10, 2023
b71b07b
Merge pull request #271 from huangshiyu13/main
huangshiyu13 Nov 10, 2023
3c34b5e
not using shared model
WentseChen Nov 12, 2023
8c196b3
update
huangshiyu13 Nov 13, 2023
6e9ce0f
fix petting zoo
huangshiyu13 Nov 13, 2023
6cd773a
Merge pull request #272 from huangshiyu13/main
huangshiyu13 Nov 13, 2023
f1ecdef
add MAT network test
huangshiyu13 Nov 15, 2023
7468cd6
Merge pull request #273 from huangshiyu13/main
huangshiyu13 Nov 15, 2023
58c022f
update README.md
huangshiyu13 Nov 23, 2023
ceae61a
Merge pull request #274 from huangshiyu13/main
huangshiyu13 Nov 23, 2023
b08d096
add selfplay test
huangshiyu13 Nov 23, 2023
7373b04
add selfplay test
huangshiyu13 Nov 23, 2023
7ded5d5
add selfplay test
huangshiyu13 Nov 23, 2023
2bd4658
add selfplay test
huangshiyu13 Nov 23, 2023
17cf742
add selfplay test
huangshiyu13 Nov 23, 2023
4344954
add selfplay test
huangshiyu13 Nov 24, 2023
a702e8d
add selfplay test
huangshiyu13 Nov 24, 2023
714a9ec
add selfplay test
huangshiyu13 Nov 24, 2023
91ac0df
add selfplay test
huangshiyu13 Nov 24, 2023
2a1cf9c
update data parallel and model parallel
WentseChen Nov 28, 2023
0c2ea2b
- add net, gail test
huangshiyu13 Nov 28, 2023
e8c6ee9
Merge pull request #276 from huangshiyu13/main
huangshiyu13 Nov 28, 2023
4679894
- fix bugs: AttributeError: module 'openrl.envs' has no attribute 'Pe…
huangshiyu13 Dec 5, 2023
9a05e6f
- fix bugs: AttributeError: module 'openrl.envs' has no attribute 'Pe…
huangshiyu13 Dec 5, 2023
5d21f65
Merge pull request #277 from huangshiyu13/main
huangshiyu13 Dec 5, 2023
dac2804
Add envpool to openrl
kingjuno Dec 7, 2023
3c31b5a
Remove unwanted test for envpool
kingjuno Dec 7, 2023
990f8c5
Fix a typo: envpool/train-ppo
kingjuno Dec 7, 2023
448cc6f
Fix dependency error: stablebaseline3
kingjuno Dec 7, 2023
48edf32
update readme
huangshiyu13 Dec 12, 2023
c1a6f56
Merge branch 'main' of github.com:OpenRL-Lab/openrl
huangshiyu13 Dec 12, 2023
e29fd90
improve test
huangshiyu13 Dec 12, 2023
4950ef9
Merge pull request #279 from huangshiyu13/main
huangshiyu13 Dec 12, 2023
40d8304
improve test
huangshiyu13 Dec 12, 2023
e12eb5e
Merge pull request #280 from huangshiyu13/main
huangshiyu13 Dec 12, 2023
38da73a
improve test
huangshiyu13 Dec 14, 2023
76d1e05
Merge pull request #281 from huangshiyu13/main
huangshiyu13 Dec 14, 2023
05295b8
ds_support
WentseChen Dec 19, 2023
cd7f5b0
meteor_init_bug
WentseChen Dec 19, 2023
7016328
meteor_init_bug
Dec 20, 2023
36b1d81
Merge branch '1122_ds' of github.com:Chen001117/openrl into 1122_ds
Dec 20, 2023
d470127
Merge branch 'main' of github.com:OpenRL-Lab/openrl into 1122_ds
Dec 20, 2023
3af7588
update format
Dec 20, 2023
f8879b3
fix test w/o gpu bug
Dec 20, 2023
fc02030
fix set reward bug
Dec 20, 2023
8185373
deepspeed support
huangshiyu13 Dec 20, 2023
753fba7
Move envpool to examples
kingjuno Dec 20, 2023
693d2e1
Revert files in openrl folder
kingjuno Dec 20, 2023
e864a08
Merge pull request #278 from kingjuno/Issue-#216
huangshiyu13 Dec 20, 2023
5346eee
init v0.2.0
huangshiyu13 Dec 20, 2023
2b798c0
init v0.2.0
huangshiyu13 Dec 20, 2023
a50c041
init v0.2.0
huangshiyu13 Dec 20, 2023
5b4dae2
update readme
huangshiyu13 Dec 20, 2023
59efab1
Merge pull request #284 from huangshiyu13/main
huangshiyu13 Dec 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/unit_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y xvfb libglu1-mesa-dev python3-opengl
- name: Upgrade pip
run: |
python -m pip install --upgrade pip setuptools wheel
Expand All @@ -27,7 +31,7 @@ jobs:
- name: do_unittest
timeout-minutes: 40
run: |
python3 -m pytest tests --cov=openrl --cov-report=xml -m unittest --cov-report=term-missing --durations=0 -v --color=yes
xvfb-run -s "-screen 0 1400x900x24" python3 -m pytest tests --cov=openrl --cov-report=xml -m unittest --cov-report=term-missing --durations=0 -v --color=yes -s
- name: Upload coverage reports to Codecov with GitHub Action
uses: codecov/codecov-action@v3
with:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,11 @@ run_results/
api_docs
.vscode
*.pkl
api_docs
*.json
opponent_pool
!/examples/selfplay/opponent_templates/tictactoe_opponent/info.json
!/examples/nlp/ds_config.json
!/examples/nlp/eval_ds_config.json
wandb_run
examples/dmc/new.gif
/examples/snake/submissions/rl/actor_2000.pth
Expand Down
4 changes: 2 additions & 2 deletions Project.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ However, in many practical applications, it is important to develop reasonable a
In this paper, we propose an on-policy framework for discovering multiple strategies for the same task.
Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks.

- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAMAS Extended Abstract 2023)
- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Ting Chen, Jun Zhu
- Paper: [DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization](https://arxiv.org/abs/2207.05631)(AAAAI 2024)
- Authors: Wenze Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting Chen, Jun Zhu


36 changes: 19 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<div align="center">
<a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
<a href="https://openrl-docs.readthedocs.io/"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
</div>

---
Expand All @@ -25,10 +25,10 @@
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE)

[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up)
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.7 is updated on Sep 21, 2023
OpenRL-v0.2.0 is updated on Dec 20, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.
Expand Down Expand Up @@ -58,6 +58,8 @@ Currently, the features supported by OpenRL include:

- Reinforcement learning training support for natural language tasks (such as dialogue)

- Support [DeepSpeed](https://github.com/microsoft/DeepSpeed)

- Support [Arena](https://openrl-docs.readthedocs.io/en/latest/arena/index.html) , which allows convenient evaluation of
various agents (even submissions for [JiDi](https://openrl-docs.readthedocs.io/en/latest/arena/index.html#performing-local-evaluation-of-agents-submitted-to-the-jidi-platform-using-openrl)) in a competitive environment.

Expand Down Expand Up @@ -160,19 +162,19 @@ Here we provide a table for the comparison of OpenRL and existing popular RL lib
OpenRL employs a modular design and high-level abstraction, allowing users to accomplish training for various tasks
through a unified and user-friendly interface.

| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | Bilingual Document |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |
| Library | NLP/RLHF | Multi-agent | Self-Play Training | Offline RL | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:--------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## Installation

Expand Down Expand Up @@ -333,7 +335,7 @@ If you are using OpenRL in your research project, you are also welcome to join t

- Join the [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg) group to discuss
OpenRL usage and development with us.
- Join the [Discord](https://discord.gg/guvAS2up) group to discuss OpenRL usage and development with us.
- Join the [Discord](https://discord.gg/qMbVT2qBhr) group to discuss OpenRL usage and development with us.
- Send an E-mail to: [[email protected]]([email protected])
- Join the [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions).

Expand Down
19 changes: 10 additions & 9 deletions README_zh.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<div align="center">
<a href="https://openrl-docs.readthedocs.io/zh/latest/index.html"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
<a href="https://openrl-docs.readthedocs.io/"><img width="450px" height="auto" src="docs/images/openrl_text.png"></a>
</div>


Expand All @@ -26,10 +26,10 @@
[![Contributors](https://img.shields.io/github/contributors/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/OpenRL-Lab/openrl)](https://github.com/OpenRL-Lab/openrl/blob/master/LICENSE)

[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/guvAS2up)
[![Embark](https://img.shields.io/badge/discord-OpenRL-%237289da.svg?logo=discord)](https://discord.gg/qMbVT2qBhr)
[![slack badge](https://img.shields.io/badge/Slack-join-blueviolet?logo=slack&amp)](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)

OpenRL-v0.1.7 is updated on Sep 21, 2023
OpenRL-v0.1.10 is updated on Oct 27, 2023

The main branch is the latest version of OpenRL, which is under active development. If you just want to have a try with
OpenRL, you can switch to the stable branch.
Expand All @@ -51,6 +51,7 @@ OpenRL基于PyTorch进行开发,目标是为强化学习研究社区提供一
- 支持通过专家数据进行离线强化学习训练
- 支持自博弈训练
- 支持自然语言任务(如对话任务)的强化学习训练
- 支持[DeepSpeed](https://github.com/microsoft/DeepSpeed)
- 支持[竞技场](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html)功能,可以在多智能体对抗性环境中方便地对各种智能体(甚至是[及第平台](https://openrl-docs.readthedocs.io/zh/latest/arena/index.html#openrl)上提交的智能体)进行评测。
- 支持从[Hugging Face](https://huggingface.co/)上导入模型和数据。支持加载Hugging Face上[Stable-baselines3的模型](https://openrl-docs.readthedocs.io/zh/latest/sb3/index.html)来进行测试和训练。
- 提供用户自有环境接入OpenRL的[详细教程](https://openrl-docs.readthedocs.io/zh/latest/custom_env/index.html).
Expand Down Expand Up @@ -128,18 +129,18 @@ OpenRL-Lab将持续维护和更新OpenRL,欢迎大家加入我们的[开源社

这里我们提供了一个表格,比较了OpenRL和其他常用的强化学习库。 OpenRL采用模块化设计和高层次的抽象,使得用户可以通过统一的简单易用的接口完成各种任务的训练。

| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | 双语文档 |
| 强化学习库 | 自然语言任务/RLHF | 多智能体训练 | 自博弈训练 | 离线强化学习 | [DeepSpeed](https://github.com/microsoft/DeepSpeed) |
|:------------------------------------------------------------------:|:------------------:|:--------------------:|:--------------------:|:------------------:|:------------------:|
| **[OpenRL](https://github.com/OpenRL-Lab/openrl)** | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) | :x: | :x: | :x: | :x: | :x: |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :heavy_check_mark: |
| [DI-engine](https://github.com/opendilab/DI-engine/) | :x: | :heavy_check_mark: | not fullly supported | :heavy_check_mark: | :x: |
| [Tianshou](https://github.com/thu-ml/tianshou) | :x: | not fullly supported | not fullly supported | :heavy_check_mark: | :x: |
| [MARLlib](https://github.com/Replicable-MARL/MARLlib) | :x: | :heavy_check_mark: | not fullly supported | :x: | :x: |
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | :x: | :heavy_check_mark: | :x: | :x: | :x: |
| [RL4LMs](https://github.com/allenai/RL4LMs) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :x: |
| [trlx](https://github.com/CarperAI/trlx) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [trl](https://github.com/huggingface/trl) | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
| [TimeChamber](https://github.com/inspirai/TimeChamber) | :x: | :x: | :heavy_check_mark: | :x: | :x: |

## 安装
Expand Down Expand Up @@ -293,7 +294,7 @@ openrl --mode train --env CartPole-v1

- 加入 [slack](https://join.slack.com/t/openrlhq/shared_invite/zt-1tqwpvthd-Eeh0IxQ~DIaGqYXoW2IUQg)
群组,与我们一起讨论OpenRL的使用和开发。
- 加入 [Discord](https://discord.gg/guvAS2up) 群组,与我们一起讨论OpenRL的使用和开发。
- 加入 [Discord](https://discord.gg/qMbVT2qBhr) 群组,与我们一起讨论OpenRL的使用和开发。
- 发送邮件到: [[email protected]]([email protected])
- 加入 [GitHub Discussion](https://github.com/orgs/OpenRL-Lab/discussions)

Expand Down
9 changes: 9 additions & 0 deletions examples/arena/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

```bash
pip install "openrl[selfplay]"
pip install "pettingzoo[mpe]","pettingzoo[butterfly]"
```

### Usage
Expand All @@ -15,3 +16,11 @@ python run_arena.py
### Evaluate Google Research Football submissions for JiDi locally

If you want to evaluate your Google Research Football submissions for JiDi locally, please try to use tizero as illustrated [here](foothttps://github.com/OpenRL-Lab/TiZero#evaluate-jidi-submissions-locally).

### Evaluate more environments

We also provide a script to evaluate more environments, including MPE, Go, Texas Holdem, Butterfly. You can run the script as follows:

```shell
python evaluate_more_envs.py
```
104 changes: 104 additions & 0 deletions examples/arena/evaluate_more_envs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright 2023 The OpenRL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

""""""

from pettingzoo.butterfly import cooperative_pong_v5
from pettingzoo.classic import connect_four_v3, go_v5, rps_v2, texas_holdem_no_limit_v6
from pettingzoo.mpe import simple_push_v3

from openrl.arena import make_arena
from openrl.arena.agents.local_agent import LocalAgent
from openrl.arena.agents.random_agent import RandomAgent
from openrl.envs.PettingZoo.registration import register
from openrl.envs.wrappers.pettingzoo_wrappers import RecordWinner


def ConnectFourEnv(render_mode, **kwargs):
return connect_four_v3.env(render_mode)


def RockPaperScissorsEnv(render_mode, **kwargs):
return rps_v2.env(num_actions=3, max_cycles=15)


def GoEnv(render_mode, **kwargs):
return go_v5.env(render_mode=render_mode, board_size=5, komi=7.5)


def TexasHoldemEnv(render_mode, **kwargs):
return texas_holdem_no_limit_v6.env(render_mode=render_mode)


# MPE
def SimplePushEnv(render_mode, **kwargs):
return simple_push_v3.env(render_mode=render_mode)


def CooperativePongEnv(render_mode, **kwargs):
return cooperative_pong_v5.env(render_mode=render_mode)


def register_new_envs():
new_env_dict = {
"connect_four_v3": ConnectFourEnv,
"RockPaperScissors": RockPaperScissorsEnv,
"go_v5": GoEnv,
"texas_holdem_no_limit_v6": TexasHoldemEnv,
"simple_push_v3": SimplePushEnv,
"cooperative_pong_v5": CooperativePongEnv,
}

for env_id, env in new_env_dict.items():
register(env_id, env)
return new_env_dict.keys()


def run_arena(
env_id: str,
parallel: bool = True,
seed=0,
total_games: int = 10,
max_game_onetime: int = 5,
):
env_wrappers = [RecordWinner]

arena = make_arena(env_id, env_wrappers=env_wrappers, use_tqdm=False)

agent1 = LocalAgent("../selfplay/opponent_templates/random_opponent")
agent2 = RandomAgent()

arena.reset(
agents={"agent1": agent1, "agent2": agent2},
total_games=total_games,
max_game_onetime=max_game_onetime,
seed=seed,
)
result = arena.run(parallel=parallel)
arena.close()
print(result)
return result


def test_new_envs():
env_ids = register_new_envs()
seed = 0
for env_id in env_ids:
run_arena(env_id=env_id, seed=seed, parallel=False, total_games=1)


if __name__ == "__main__":
test_new_envs()
Loading