Merge pull request #89 from Unity-Technologies/develop-parameters

Obstacle Tower Environment 2.0
Unity-Technologies · May 14, 2019 · d31de13 · d31de13
2 parents 5699e90 + 4c9e1f6
commit d31de13
Show file tree

Hide file tree

Showing 6 changed files with 139 additions and 54 deletions.
diff --git a/README.md b/README.md
@@ -29,6 +29,11 @@ To learn more, please read our AAAI Workshop paper:
 * v1.3 Hotfix release.
    * Resolves memory leak when running in Docker.
    * Fixes issue where environment could freeze on certain higher floors.
+* v2.0 Obstacle Tower Challenge Round 2 Release.
+   * Towers can now be generated with up to 100 floors.
+   * Additional visual themes, obstacles, enemy types, and floor difficulties added.
+   * Additional reset parameters added to customize generated towers. Go [here](./reset-parameters.md) for details on the parameters and their values.
+   * Various bugs fixed and performance improvements.
 
 
 ## Installation
@@ -48,9 +53,9 @@ Python dependencies (also in [setup.py](https://github.com/Unity-Technologies/ob
 
 | *Platform*     | *Download Link*                                                                     |
 | --- | --- |
-| Linux (x86_64) | https://storage.googleapis.com/obstacle-tower-build/v1.3/obstacletower_v1.3_linux.zip   |
-| Mac OS X       | https://storage.googleapis.com/obstacle-tower-build/v1.3/obstacletower_v1.3_osx.zip     |
-| Windows        | https://storage.googleapis.com/obstacle-tower-build/v1.3/obstacletower_v1.3_windows.zip |
+| Linux (x86_64) | https://storage.googleapis.com/obstacle-tower-build/v2.0/obstacletower_v2.0_linux.zip   |
+| Mac OS X       | https://storage.googleapis.com/obstacle-tower-build/v2.0/obstacletower_v2.0_osx.zip     |
+| Windows        | https://storage.googleapis.com/obstacle-tower-build/v2.0/obstacletower_v2.0_windows.zip |
 
 For checksums on these files, see [here](https://storage.googleapis.com/obstacle-tower-build/v1.3/ote-v1.3-checksums.txt).
 
@@ -68,6 +73,10 @@ $ pip install -e .
 
 To see an example of how to interact with the environment using the gym interface, see our [Basic Usage Jupyter Notebook](examples/basic_usage.ipynb).
 
+### Customizing the environment
+
+Obstacle Tower can be configured in a number of different ways to adjust the difficulty and content of the environment. This is done through the use of reset parameters, which can be set when calling `env.reset()`. See [here](./reset-parameters.md) for a list of the available parameters to adjust. 
+
 ### Player Control
 
 It is also possible to launch the environment in "Player Mode," and directly control the agent using a keyboard. This can be done by double-clicking on the binary file. The keyboard controls are as follows:

diff --git a/examples/basic_usage.ipynb b/examples/basic_usage.ipynb
diff --git a/examples/gcp_training.md b/examples/gcp_training.md
@@ -107,8 +107,8 @@ cd ../
 This page lists the URLs for downloading Obstacle Tower for various platforms. [https://github.com/Unity-Technologies/obstacle-tower-env](https://github.com/Unity-Technologies/obstacle-tower-env). For GCP, run
 
 ```bash
-wget https://storage.googleapis.com/obstacle-tower-build/v1.3/obstacletower_v1.3_linux.zip
-unzip obstacletower_v1.3_linux.zip
+wget https://storage.googleapis.com/obstacle-tower-build/v2.0/obstacletower_v2.0_linux.zip
+unzip obstacletower_v2.0_linux.zip
 ```
 
 ### Install Dopamine
@@ -157,7 +157,7 @@ cp dopamine_otc/unity_lib.py dopamine/dopamine/discrete_domains/unity_lib.py
 cp dopamine_otc/rainbow_otc.gin dopamine/dopamine/agents/rainbow/configs/rainbow_otc.gin
 ```
 
-If you didn’t extract the `obstacletower_v1.3_linux.zip` to the home directory, you will need to edit `rainbow_otc.gin`, specifically `create_otc_environment.environment_path` should correspond to the path to your extracted OTC executable file. 
+If you didn’t extract the `obstacletower_v2.0_linux.zip` to the home directory, you will need to edit `rainbow_otc.gin`, specifically `create_otc_environment.environment_path` should correspond to the path to your extracted OTC executable file. 
 
 Furthermore, within this file you will find settings on how long to train for, and how often to evaluate your agent. Each iteration, Dopamine will train for `Runner.training_steps`, evaluate (i.e. run in inference mode) for `Runner.evaluation_steps`, record these results, and checkpoint the agent. It will repeat this process `Runner.num_iterations` number of times before quitting. For instance, you can change `Runner.num_iterations` to 40 to train for 10 million steps. You can also reduce `Runner.evaluation_steps` to reduce the time spent not training. There are other hyperparameters found in this file, which you can modify to improve performance. But for the sake of this exercise, you may leave them as-is. 
 

diff --git a/obstacle_tower_env.py b/obstacle_tower_env.py
@@ -20,10 +20,10 @@ class UnityGymException(error.Error):
 
 
 class ObstacleTowerEnv(gym.Env):
-    ALLOWED_VERSIONS = ['1', '1.1', '1.2', '1.3']
+    ALLOWED_VERSIONS = ['2.0']
 
     def __init__(self, environment_filename=None, docker_training=False, worker_id=0, retro=True,
-                 timeout_wait=30, realtime_mode=False):
+                 timeout_wait=30, realtime_mode=False, config=None, greyscale=False):
         """
         Arguments:
           environment_filename: The file path to the Unity executable.  Does not require the extension.
@@ -64,11 +64,19 @@ def __init__(self, environment_filename=None, docker_training=False, worker_id=0
         self._n_agents = None
         self._done_grading = False
         self._flattener = None
+        self._greyscale = greyscale
+
+        # Environment reset parameters
         self._seed = None
         self._floor = None
+
         self.realtime_mode = realtime_mode
         self.game_over = False  # Hidden flag used by Atari environments to determine if the game is over
         self.retro = retro
+        if config != None:
+            self.config = config
+        else:
+            self.config = None
 
         flatten_branched = self.retro
         uint8_visual = self.retro
@@ -107,7 +115,10 @@ def __init__(self, environment_filename=None, docker_training=False, worker_id=0
         high = np.array([np.inf] * brain.vector_observation_space_size)
         self.action_meanings = brain.vector_action_descriptions
 
-        depth = 3
+        if self._greyscale:
+            depth = 1
+        else:
+            depth = 3
         image_space_max = 1.0
         image_space_dtype = np.float32
         camera_height = brain.camera_resolutions[0]["height"]
@@ -139,15 +150,20 @@ def done_grading(self):
     def is_grading(self):
         return os.getenv('OTC_EVALUATION_ENABLED', False)
 
-    def reset(self):
+    def reset(self, config=None):
         """Resets the state of the environment and returns an initial observation.
         In the case of multi-agent environments, this is a list.
         Returns: observation (object/list): the initial observation of the
             space.
         """
-        reset_params = {}
+        if config is None:
+            reset_params = {}
+            if self.config is not None:
+                reset_params = self.config
+        else:
+            reset_params = config
         if self._floor is not None:
-            reset_params['floor-number'] = self._floor
+            reset_params['starting-floor'] = self._floor
         if self._seed is not None:
             reset_params['tower-seed'] = self._seed
 
@@ -197,18 +213,31 @@ def step(self, action):
     def _single_step(self, info):
         self.visual_obs = self._preprocess_single(info.visual_observations[0][0, :, :, :])
 
+        self.visual_obs, keys, time, current_floor = self._prepare_tuple_observation(
+            self.visual_obs, info.vector_observations[0])
+
         if self.retro:
             self.visual_obs = self._resize_observation(self.visual_obs)
             self.visual_obs = self._add_stats_to_image(
                 self.visual_obs, info.vector_observations[0])
             default_observation = self.visual_obs
         else:
-            default_observation = self._prepare_tuple_observation(
-                self.visual_obs, info.vector_observations[0])
+            default_observation = self.visual_obs, keys, time, current_floor
+
+        if self._greyscale:
+            default_observation = self._greyscale_obs(default_observation)
 
         return default_observation, info.rewards[0], info.local_done[0], {
             "text_observation": info.text_observations[0],
-            "brain_info": info}
+            "brain_info": info,
+            "total_keys": keys,
+            "time_remaining": time,
+            "current_floor": current_floor
+        }
+
+    def _greyscale_obs(self, obs):
+        new_obs = np.floor(np.expand_dims(np.mean(obs, axis=2), axis=2)).astype(np.uint8)
+        return new_obs
 
     def _preprocess_single(self, single_visual_obs):
         if self.uint8_visual:
@@ -244,11 +273,11 @@ def seed(self, seed=None):
 
         seed = int(seed)
         if seed < 0 or seed >= 100:
-            logger.warn(
+            logger.warning(
                 "Seed outside of valid range [0, 100). A random seed "
                 "within the valid range will be used on next reset."
             )
-        logger.warn("New seed " + str(seed) + " will apply on next reset.")
+        logger.warning("New seed " + str(seed) + " will apply on next reset.")
         self._seed = seed
 
     def floor(self, floor=None):
@@ -259,12 +288,12 @@ def floor(self, floor=None):
             return
 
         floor = int(floor)
-        if floor < 0 or floor >= 25:
-            logger.warn(
-                "Starting floor outside of valid range [0, 25). Floor 0 will be used"
+        if floor < 0 or floor >= 99:
+            logger.warning(
+                "Starting floor outside of valid range [0, 99). Floor 0 will be used"
                 "on next reset."
             )
-        logger.warn("New starting floor " + str(floor) + " will apply on next reset.")
+        logger.warning("New starting floor " + str(floor) + " will apply on next reset.")
         self._floor = floor
 
     @staticmethod
@@ -283,8 +312,9 @@ def _prepare_tuple_observation(vis_obs, vector_obs):
         """
         key = vector_obs[0:6]
         time = vector_obs[6]
+        floor_number = vector_obs[7]
         key_num = np.argmax(key, axis=0)
-        return vis_obs, key_num, time
+        return vis_obs, key_num, time, floor_number
 
     @staticmethod
     def _add_stats_to_image(vis_obs, vector_obs):
@@ -293,7 +323,7 @@ def _add_stats_to_image(vis_obs, vector_obs):
         """
         key = vector_obs[0:6]
         time = vector_obs[6]
-        key_num = np.argmax(key, axis=0)
+        key_num = int(np.argmax(key, axis=0))
         time_num = min(time, 10000) / 10000
 
         vis_obs[0:10, :, :] = 0

diff --git a/reset-parameters.md b/reset-parameters.md
@@ -0,0 +1,19 @@
+# Obstacle Tower Reset Parameters
+
+Obstacle Tower can be configured in a variety of ways both when launching the environment and on episode reset. Below are a list of parameters, along with the ranges of values and what they correspond to. Pass these as part of a `config` dictionary when calling `env.reset(config=config)`, or pass them as part of a dicitonary when launching the environment in `ObstacleTowerEnv('path_to_binary', config=config)`. 
+
+*Note: The config passed on environment launch will be the default used when starting a new episode if there is no config passed during `env.reset()`.*
+
+| *Parameter*  | *Value range* | *Effect* |                                                                  
+| --- | --- | --- |
+| `tower-seed` | (-1 - 9999)| Sets the seed used to generate the tower. -1 corresponds to a random tower on every `reset()` call. 
+| `starting-floor` | (0, 99)| Sets the starting floor for the agent on `reset()`. 
+| `total-floors` | (1, 100) | Sets the maximum number of possible floors in the tower.
+| `dense-reward` | (0, 1) | Whether to use the sparse (0) or dense (1) reward function.
+| `lighting-type` | (0, 1, 2) | Whether to use no realtime light (0), a single realtime light with minimal color variations (1), or a realtime light with large color variations (2). 
+| `visual-theme` | (0, 1, 2) | Whether to use only the `default-theme` (0), the normal ordering or themes (1), or a random theme every floor (2).
+| `agent-perspective` | (0, 1) | Whether to use first-person (0) or third-person (1) perspective for the agent.
+| `allowed-rooms` | (0, 1, 2) | Whether to use only normal rooms (0), normal and key rooms (1), or normal, key, and puzzle rooms (2). 
+| `allowed-modules` | (0, 1, 2) | Whether to fill rooms with no modules (0), only easy modules (1), or the full range of modules (2). 
+| `allowed-floors` | (0, 1, 2) | Whether to include only straightforward floor layouts (0), layouts that include branching (1), or layouts that include branching and circling (2).
+| `default-theme` | (0, 1, 2, 3, 4) | Whether to set the default theme to `Ancient` (0), `Moorish` (1), `Industrial` (2), `Modern` (3), or `Future` (4). 
diff --git a/setup.py b/setup.py
@@ -5,7 +5,7 @@
 
 setup(
     name='obstacle_tower_env',
-    version='1.3',
+    version='2.0',
     author='Unity Technologies',
     url='https://github.com/Unity-Technologies/obstacle-tower-env',
     py_modules=["obstacle_tower_env"],