Unstable Baselines(USB) is designed to serve as a quick-start guide for Reinforcement Learning beginners and a codebase for agile algorithm development. The algorithms strictly follows the original implementations, and the performance of Unstable Baselines matches those in the original implementations. USB is currently maintained by researchers from lamda-rl.
Stable Algorithms (Runnable and has equivalent performance to that of the original implementations):
- Baselines
- Deep Q Learning (DQN)
- Vanilla Policy Gradient (VPG)
- Deep Deterministic Policy Gradient (DDPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Soft Actor Critic (SAC)
- Twin Delayed Deep Deterministic policy gradient algorithm (TD3)
- Randomized Ensembled Double Q-Learning (REDQ)
- Model Based Reinforcement Learning
- Meta Reinforcement Learning
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML)
- The Option-Critic Architecture (OC)
git clone --recurse-submodules https://github.com/x35f/unstable_baselines.git
cd unstable_baselines
conda env create -f env.yaml
conda activate rl_base
pip install -e .
python3 /path/to/algorithm/main.py /path/to/algorithm/configs/some-config.json args(optional)
#install metaworld for meta_rl benchmark
cd envs/metaworld
pip install -e .
#install atari
pip install gym[all]
- Add comments for algorithms
- Add Documentation