Skip to content

Latest commit

 

History

History
383 lines (296 loc) · 9.49 KB

README.md

File metadata and controls

383 lines (296 loc) · 9.49 KB

PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Description

This code supports the Submission PUZZLES: A Benchmark for Neural Algorithmic Reasoning. A current version of the paper is available on arXiv.

We provide RLP, a Reinforcement Learning (RL) environment based on Simon Tatham's Portable Puzzle Collection, designed for use with Farama's Gymnasium RL tools.

Along with RLP, you may find scripts that enable reproduction of the results presented in the paper. To this end, we give instructions on how to use them below.

Installation Guide

Requirements

Python 3.10+, a reasonably recent C compiler such as GCC or Clang, CMake 3.26.

We only tested the code on Linux and recommend using Python 3.11.

Step-by-step Guide

First, clone the git repository to your local machine.

git clone https://github.com/ETH-DISCO/rlp.git

Step into the directory.

cd rlp

Install all required packages and build the C libraries.

./install.sh

Activate RLP's virtual Python environment.

source rlpvenv/bin/activate

Usage Guide

After successfully following the Installation Guide, you can now run the RLP environment!

When initializing a puzzle, you must supply the desired puzzle's name. Refer to the list of puzzle names.

You may find the exact commands to reproduce the paper's experiments in experiment_commands.txt.

Train an Agent

In order to train an agent in a specific puzzle, run the following command in the repository's top level.

./run_training.py --puzzle <name of puzzle> --arg <parameters>

Run ./run_training.py --help for the full range of customizable options.

Check the list of puzzle names.

Run a previously trained Agent

In order to run an agent previously trained on a specific puzzle, run the following command in the repository's top level.

./run_trained_agent.py --puzzle <name of puzzle> --arg <parameters>

Run ./run_trained_agent.py --help for the full range of customizable options.

Check the list of puzzle names.

Random Agent

To have an agent perform random actions in one of the puzzles, run the following command in the repository's top level:

./run_random.py --puzzle <name of puzzle> --arg <parameters>

Run ./run_random.py --help for the full range of customizable options.

Check the list of puzzle names.

Manual Play

To manually play one of the puzzles, run the following command in the repository's top level:

./run_puzzle.py --puzzle <name of puzzle>

Run ./run_puzzle.py --help for the full range of customizable options.

Check the list of puzzle names.

Evaluation of an Agent

In order to evaluate an agent previously trained on a specific puzzle, run the following command in the repository's top level.

./run_evaluation.py --puzzle <name of puzzle> --arg <parameters>

Run an LLM that is available via API

To run an LLM that is available via API, run the following command in the repository's top level:

./llm/evaluate_llm_agent.py --puzzle <name of puzzle> --arg <parameters> --model_type <gpt4o-mini|gpt4o|gemini-1.5-flash|gemini-1.5-pro>

If you want to evaluate another LLM, you have to write a custom class in llm/llm_api_agent. Inherit from llm/llm_api_agent/abstract_puzzles_agent.py and implement the abstract methods.

List of Puzzles

blackbox bridges cube dominosa fifteen
filling flip flood galaxies guess
inertia keen lightup loopy magnets
map mines mosaic net netslide
palisade pattern pearl pegs range
rect samegame signpost singles sixteen
slant solo tents towers tracks
twiddle undead unequal unruly untangle

Developer Notes

Custom Reward Structure

One can use a Gymnasium environment wrapper (Documentation) to give out custom rewards in order to improve the agent's learning process. The puzzle's internal game state is provided in the info dict created by the environment after each step(). Its attributes can be accessed using

info['puzzle_state']['<attribute name>']

An example can be found in custom_rewards_example.py.

Adding a new puzzle

A new puzzle can be added by creating a new C backend including the game logic and associated data structures. For more details we refer to Simon Tatham’s developer documentation. The new C file should be placed in the folder puzzles, where the other backend files are located. Subsequently, the new puzzle needs to be added to the build system by adding an entry in puzzles/CMakeLists.txt.

Additionally, certain functions in our environment need to be implemented or adapted, such as retrieving the dictionary containing the puzzles’ logical state. The relevant code needs to be modified in the files rlp/specific_api.py and rlp/envs/observation_spaces.py.

Function Description
make_puzzle_state()
Converts the data structures that represent a puzzle's internal logical state to a Python dict.
set_api_structures_newpuzzle()
Sets the ctypes definitions for the backend's game_params, game_ui, game_drawstate, game_state and their associated classes.
get_action_keys_newpuzzle()
Returns a dict containing all keyboard keys used to play a puzzle.
get_observation_space_newpuzzle()
Returns a dict containing the internal data observation space for a puzzle.
get_observation_newpuzzle()
Returns a dict containing the internal data observation for a puzzle.

For the latter four, the new functions need to be added in the four dicts set_api_structures_methods, get_action_keys_methods, get_observation_space_methods and get_observation_methods respectively.

Training an LLM

To train an LLM on the puzzles, we recommend using a library such as LlamaGym.

License

The RLP code is released under the CC BY-NC 4.0 license. For more information, see LICENSE.

Simon Tatham's Portable Puzzle Collection is licensed under the MIT License, see puzzles/LICENCE.