Skip to content

Rogue-Bench

Rogue-Bench logo

Rogue-Bench is a benchmark where agents play Rogue. Specifically, how well LLMs can play the classic dungeon crawler.

This work would not be possible without Rogue Collection. If you just want to play Rogue, head over there.

Once set up, you should be able to produce a result like this:

Example Rogue-Bench run with GPT-5.4-mini

GPT-5.4-mini playing Rogue.

Get started

Note

Rogue-Bench compilation and runs have been tested on (WSL2) Ubuntu 24.04. If you are struggling to get something working locally, try the Docker setup.

Local

To run locally, execute:

git clone --recursive https://github.com/iwhalen/rogue-bench.git 
cd rogue-bench
make install  # Install system level dependencies
make build    # Compile the custom headless Rogue executable
uv run rogue-bench --player human

This will start a "human" session where you can control Rogue with keyboard inputs. This is a good sanity check before setting up a real agent.

For all command line options, see:

uv run rogue-bench --help

For more on the Rogue-Bench CLI, see here.

Docker

To run Rogue in Docker, execute:

git clone --recursive https://github.com/iwhalen/rogue-bench.git
cd rogue-bench
make build-docker
uv run rogue-bench --docker-image rogue-bench --player human

Again, this will start in "human" mode.

How it works

Rogue-Bench runs a slightly modified, headless Rogue executable and communicates with it over pipes. The Python library reads Rogue's terminal output, (optionally) parses it into a screen state, and sends keystrokes back to the game.

No Rogue gameplay elements have been changed. Specifically, the version of Rogue is fixed to Unix Rogue 5.4.2.

Runs will accumulate statistics, metadata, and log keystrokes. This allows post-hoc analysis as well as the ability to replay an entire run.

For more specifics on the implementation, see the Github repository.

License

Note that the Python code for running Rogue-Bench is offered under the GPL-3.0 license.

The modified Rogue executables are under the same license(s) as the Rogue Collection. At the time of writing, this is a mix of GPL-3.0 and other licenses.

Rogue is a trademark of Epyx, Inc. Rogue-Bench is not associated with Epyx in any way.