Implementations of reinforcement learning algorithms in TensorFlow.
The aim is to implement each algorithm such that different Q/value/policy/representation networks can be plugged in for easy experimentation.
Created and tested using:
- Python 3.5
- TensorFlow 1.4
- tensorflow
- numpy
- gym
- opencv-python
- matplotlib
- seaborn
pip install -r requirements.txt
Or for GPU TensorFlow:
pip install -r requirements-gpu.txt
To train on the CartPole-v0 environment:
python main.py
Additional command line arguments are detailed in main.py. This can be made to work with Atari with very minimal edits. CartPole is the default environment currently while this is being developed but the default will be switched to Atari once everything is implemented and tested.
Mean test episode length during training on CartPole-v0 with double Q-learning and prioritised experience replay enabled (with minimal hyperparameter search performed):
- Create experience replay buffer within TensorFlow
- Refactor to use an 'observe' function, which should be the agents only interaction outside of TensorFlow
- Complete implementation of Rainbow
- Implement n-step Q-learning
- Implement distributional RL
- Implement duelling networks
- Implement noisy nets
- Test on Atari
- Implement policy gradient agents (A2C, DDPG, PPO)
- Implement Distributional Reinforcement Learning with Quantile Regression
- Implement Curiosity Driven Exploration by Self-Supervised Prediction