This code supports the Submission PUZZLES: A Benchmark for Neural Algorithmic Reasoning. A current version of the paper is available on arXiv.
We provide RLP
, a Reinforcement Learning (RL) environment based on Simon Tatham's Portable Puzzle Collection, designed for use with Farama's Gymnasium RL tools.
Along with RLP
, you may find scripts that enable reproduction of the results presented in the paper. To this end, we give instructions on how to use them below.
Python 3.10+, a reasonably recent C compiler such as GCC or Clang, CMake 3.26.
We only tested the code on Linux and recommend using Python 3.11.
First, clone the git repository to your local machine.
git clone https://github.com/ETH-DISCO/rlp.git
Step into the directory.
cd rlp
Install all required packages and build the C libraries.
./install.sh
Activate RLP
's virtual Python environment.
source rlpvenv/bin/activate
After successfully following the Installation Guide, you can now run the RLP
environment!
When initializing a puzzle, you must supply the desired puzzle's name. Refer to the list of puzzle names.
You may find the exact commands to reproduce the paper's experiments in experiment_commands.txt.
In order to train an agent in a specific puzzle, run the following command in the repository's top level.
./run_training.py --puzzle <name of puzzle> --arg <parameters>
Run ./run_training.py --help
for the full range of customizable options.
Check the list of puzzle names.
In order to run an agent previously trained on a specific puzzle, run the following command in the repository's top level.
./run_trained_agent.py --puzzle <name of puzzle> --arg <parameters>
Run ./run_trained_agent.py --help
for the full range of customizable options.
Check the list of puzzle names.
To have an agent perform random actions in one of the puzzles, run the following command in the repository's top level:
./run_random.py --puzzle <name of puzzle> --arg <parameters>
Run ./run_random.py --help
for the full range of customizable options.
Check the list of puzzle names.
To manually play one of the puzzles, run the following command in the repository's top level:
./run_puzzle.py --puzzle <name of puzzle>
Run ./run_puzzle.py --help
for the full range of customizable options.
Check the list of puzzle names.
In order to evaluate an agent previously trained on a specific puzzle, run the following command in the repository's top level.
./run_evaluation.py --puzzle <name of puzzle> --arg <parameters>
To run an LLM that is available via API, run the following command in the repository's top level:
./llm/evaluate_llm_agent.py --puzzle <name of puzzle> --arg <parameters> --model_type <gpt4o-mini|gpt4o|gemini-1.5-flash|gemini-1.5-pro>
If you want to evaluate another LLM, you have to write a custom class in llm/llm_api_agent
.
Inherit from llm/llm_api_agent/abstract_puzzles_agent.py
and implement the abstract methods.
blackbox | bridges | cube | dominosa | fifteen |
filling | flip | flood | galaxies | guess |
inertia | keen | lightup | loopy | magnets |
map | mines | mosaic | net | netslide |
palisade | pattern | pearl | pegs | range |
rect | samegame | signpost | singles | sixteen |
slant | solo | tents | towers | tracks |
twiddle | undead | unequal | unruly | untangle |
One can use a Gymnasium environment wrapper (Documentation) to give out custom rewards in order to improve the agent's learning process.
The puzzle's internal game state is provided in the info
dict created by the environment after each step()
. Its attributes can be accessed using
info['puzzle_state']['<attribute name>']
An example can be found in custom_rewards_example.py.
A new puzzle can be added by creating a new C backend including the game logic and associated data structures. For more details we refer to Simon Tatham’s developer documentation. The new C file should be placed in the folder puzzles
, where the other backend files are located. Subsequently, the new puzzle needs to be added to the build system by adding an entry in puzzles/CMakeLists.txt.
Additionally, certain functions in our environment need to be implemented or adapted, such as retrieving the dictionary containing the puzzles’ logical state. The relevant code needs to be modified in the files rlp/specific_api.py and rlp/envs/observation_spaces.py.
Function | Description |
---|---|
make_puzzle_state() |
Converts the data structures that represent a puzzle's internal logical state to a Python dict. |
set_api_structures_newpuzzle() |
Sets the ctypes definitions for the backend's game_params, game_ui, game_drawstate, game_state and their associated classes. |
get_action_keys_newpuzzle() |
Returns a dict containing all keyboard keys used to play a puzzle. |
get_observation_space_newpuzzle() |
Returns a dict containing the internal data observation space for a puzzle. |
get_observation_newpuzzle() |
Returns a dict containing the internal data observation for a puzzle. |
For the latter four, the new functions need to be added in the four dicts set_api_structures_methods
, get_action_keys_methods
, get_observation_space_methods
and get_observation_methods
respectively.
To train an LLM on the puzzles, we recommend using a library such as LlamaGym.
The RLP
code is released under the CC BY-NC 4.0 license. For more information, see LICENSE.
Simon Tatham's Portable Puzzle Collection is licensed under the MIT License, see puzzles/LICENCE.