Skip to content

Latest commit

 

History

History
88 lines (53 loc) · 7.94 KB

File metadata and controls

88 lines (53 loc) · 7.94 KB

Deep Reinforcement Learning : Collaboration and Competition

This project repository contains my work for the Udacity's Deep Reinforcement Learning Nanodegree Project 3: Collaboration and Competition.

Project's goal

Tennis Agents

In this project, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.

The task is episodic, and in order to solve the environment, the agents must get an average score of +0.5 (over 100 consecutive episodes, after taking the maximum over both agents). Specifically,

  • After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 2 (potentially different) scores. We then take the maximum of these 2 scores.
  • This yields a single score for each episode.

The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.

About Deep Reinforcement Learning

Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start from a blank slate, and under the right conditions they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones – this is reinforcement.

In this project I have used a variant of DDPG called Multi Agent Deep Deterministic Policy Gradient (MADDPG) which is described in the paper Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Environment details

The environment is based on Unity ML-agents. The project environment provided by Udacity is similar to the Tennis environment on the Unity ML-Agents GitHub page.

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source Unity plugin that enables games and simulations to serve as environments for training intelligent agents. Agents can be trained using reinforcement learning, imitation learning, neuroevolution, or other machine learning methods through a simple-to-use Python API.

The observation space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. Two continuous actions are available, corresponding to movement toward (or away from) the net, and jumping.

  • Set-up: Two-player game where agents control rackets to bounce ball over a net.
  • Goal: The agents must bounce ball between one another while not dropping or sending ball out of bounds.
  • Agents: The environment contains two agent linked to a single Brain named TennisBrain. After training you can attach another Brain named MyBrain to one of the agent to play against your trained model.
  • Agent Reward Function (independent):
    • +0.1 To agent when hitting ball over net.
    • -0.1 To agent who let ball hit their ground, or hit ball out of bounds.
  • Brains: One Brain with the following observation/action space.
  • Vector Observation space: 8 variables corresponding to position and velocity of ball and racket.
    • In the Udacity provided environment, 3 observations are stacked (8 *3 = 24 variables)
  • Vector Action space: (Continuous) Size of 2, corresponding to movement toward net or away from net, and jumping.
  • Visual Observations: None.
  • Reset Parameters: One, corresponding to size of ball.
  • Benchmark Mean Reward: 2.5
  • Optional Imitation Learning scene: TennisIL.

Solving the Environment

In this Udacity project, the environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.

Getting started

Installation requirements

  • You first need to configure a Python 3.6 / PyTorch 0.4.0 environment with the needed requirements as described in the Udacity repository

  • Of course you have to clone this project and have it accessible in your Python environment

  • Then you have to install the Unity environment as described in the Getting Started section (The Unity ML-agent environment is already configured by Udacity)

  • Download the environment from one of the links below. You need only select the environment that matches your operating system:

    (For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

    (For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the "headless" version of the environment. You will not be able to watch the agent without enabling a virtual screen, but you will be able to train the agent. (To watch the agent, you should follow the instructions to enable a virtual screen, and then download the environment for the Linux operating system above.)

  • Finally, unzip the environment archive in the 'project's environment' directory and eventually adjust the path to the UnityEnvironment in the code.

Note: A conda environment file is provided with this project (so you can check/install the versions of the libraries I used)

Train a agent

Execute the provided notebook within this Nanodegree Udacity Online Workspace for "project #3 Collaboration and Competition" (or build your own local environment and make necessary adjustements for the path to the UnityEnvironment in the code )

Note :

  • Manually playing with the environment has not been implemented as it is not available with Udacity Online Worspace (No Virtual Screen)
  • Watching the trained agent playing in the environment has not been implemented neither, as it is not available with Udacity Online Worspace (No Virtual Screen) and not compatible with my personal setup (see Misc : Configuration used section)

Misc : Configuration used

This agent has been trained on my "Deep Learning Dev Box", which is basically a Linux GPU Server, running Docker containers (using Nvidia Docker 2), serving Jupyter Lab notebooks which are accessed remotely via a web interface (or a ssh connection) : unfortunately this setup does not seem suitable to run Unity ML agent, with the GPU and providing a display for for the agent (See Unity documentation for more details). Thus the headless / no visualization version of the Unity environment was used.