→ See the full article here
This repository contains the source code for the project "Centralized control for multi-agent RL in a complex Real-Time-Strategy game", which was submitted as the final project in the COMP579 - Reinforcement Learning course at McGill given by Prof. Doina Precup in Winter 2023.
→ The main scripts for understanding the code are fully commented. We present the PDF report and the code in the following sections.
→ The full report of the project is available here.
→ The Weights & Biases logs of our experiments are available here.
PPO in Lux | during training |
![]() |
![]() |
→ There are 2 main scripts of ~1000 and ~900 lines of code which are src/envs_folder/custom_env.py
and src/ppo_res_gridnet_multigpu.py
→ The repository contains many variations of gridnet scripts but the simplest one and fully commented is src/ppo_res_gridnet_multigpu.py
To train our gridnet in Lux:
Clone this repository
Install the requirements
Train Gridnet (example uses 1 GPU and 1 process)
cd src
torchrun --standalone --nproc_per_node 1 ppo_res_gridnet_multigpu.py --device-ids 0
The best agent was trained using the best parameters discovered in the hyperparameter sweep and 16 processes on 8 GPUs, running:
torchrun --standalone --nproc_per_node 16 ppo_pixel_gridnet_multigpu.py
--total-timesteps 1000000000
--device-ids 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7
In this project we implement an RL agent to compete in the Lux AI v-2 Kaggle Competition. Lux is a 1vs1 real-time-strategy game in which players must compete for resources and grow lichen in Mars. Lux is a multi-agent environment because players control variable-sized fleets of units of different natures (e.g. light and heavy robots, and factories). The full specifications of the lux environment are available here.
We propose a pixel-to-pixel architecture that we train with Proximal Policy Optimization (PPO). The encoder is a stack of Residual Blocks with Squeeze-and-Excitation layers and ReLU activations and the decoders are both a stack of Transposed Convolutions and ReLU actiovations. The critic uses and AveragePool layer and 2 fully connected layers with a ReLU activation.
If you use this code, please cite it as below
title={Centralized control for multi-agent RL in a complex Real-Time-Strategy game},
author={Castanyer, Roger Creus},
journal={arXiv preprint arXiv:2304.13004},