CLI¶
This page explains the Rogue-Bench CLI in more depth.
Of course, you can run rogue-bench --help for the short version of this information.
TL;DR¶
If you are experimenting with LLMs playing Rogue, most of your commands will look like:
rogue-bench --player agent \
--agent-class naive.NaiveAgent \
--agent-config config/gpt-5-4-mini.json \
--output-path results/gpt-5-4-mini/ \
--versioned \
--seed 0
Read on for more specific documentation.
Running Rogue¶
Rogue-Bench can be run in two modes: local and Docker. They differ only by where the Rogue executable is running.
Local mode¶
In local mode, the Rogue executable is running directly on your computer.
You first have to have system-level dependencies installed and the Rogue executable compiles to run in local mode. To carry this out, run:
git clone --recursive https://github.com/iwhalen/rogue-bench.git
cd rogue-bench
make install
make build
No extra options are needed to run in local mode.
Docker mode¶
Experimental
Note that more extensive testing has been done on local mode. Docker mode works in most cases but still may have some unexpected bugs.
If you find any problems, please open an issue.
In docker mode, the Rogue executable is running in a Docker container.
You'll first need Docker installed. See here for a guide on getting started with Docker.
Then, build the docker image with:
This will create a new docker image named rogue-bench by default. Then, when running any command, specify the --docker-image rogue-bench option.
For example, running in human mode will look like (once you activate a virtual environment with rogue-bench in the path):
Player types¶
There are currently three player types for executing Rogue runs: human, agent, and Rog-O-Matic.
Human¶
As its name suggests, this puts you (the human) in control of Rogue.
Everything works the same here on gathering statistics and replay data. Read on for more information on this.
Of course, if you only want to play Rogue, head to the Rogue Collection instead. Other options for Rogue can be found hosted on various retro gaming sites or on Steam.
Agent¶
This type of player outputs actions through a programmatic interface.
The only agent player currently implemented is NaiveAgent.
NaiveAgent¶
NaiveAgent uses an LLM with a fixed history length and simple system prompt to output commands.
To run NaiveAgent, execute:
rogue-bench --player agent \
--agent-class naive.NaiveAgent \
--agent-config config/deepseek-v4-flash.json
The --agent-class option is an import path to a RogueAgent subclass. Short paths are resolved under rogue_bench.agent, so naive.NaiveAgent loads rogue_bench.agent.naive.NaiveAgent.
The --agent-config option should point to a JSON file. For NaiveAgent, the config fields are:
model: Pydantic AI model string.max_history: number of prior request/response pairs to keep.retries: number of attempts for a model call before failing.model_settings: optional Pydantic AI model settings passed to the agent. Validated as aModelSettingsobject if provided.
You can also set --action-delay to slow down the keystrokes sent back to Rogue.
To extend NaiveAgent or add your own Agent, see here.
Info
Any LLM API keys are expected to be provided through a .env file (or already in the environment). python-dotenv is used under the hood to read .env files.
Under the hood, Pydantic AI is used to define agents. For more on Pydantic AI supported providers, see here.
Rog-O-Matic¶
Rog-O-Matic runs the classic Rogue bot instead of an LLM agent.
Rog-O-Matic can also be run in Docker mode:
Use --seed to replay a specific Rogue seed:
The --rogomatic-config option should point to a JSON file. For Rog-O-Matic, the config fields are:
fresh_run: delete Rog-O-Matic's localrlog/directory before starting.use_ltm: allow long-term-memory files across runs.genes: 8 fixed Rog-O-Matic knobs instead of the gene pool.random_gene_seed: seed used for gene generation/selection. If this is omitted,--seedis used.
For example:
When --output-path is set, Rogue-Bench saves the normal run files and also copies Rog-O-Matic's debug files when they are available:
rogomatic.logrogomatic.frogue
For more on Rog-O-Matic, see here.
Rog-O-Matic is specified as its own player as it requires quite a few hacks to get working. Check out the source for more.
Run options¶
These options can be combined with the player types above.
--rogue-path¶
Path to the local rogue-collection-headless executable. This is used for local runs and replays, but not Docker mode.
--action-delay¶
Seconds to wait between actions in agent mode. This makes watching an LLM-based agent a little more readable.
--seed¶
RNG seed for the Rogue game. If this is omitted, the current time will be used.
--output-path¶
Directory for recording a run. Rogue-Bench creates it if needed.
--versioned¶
Append a timestamped subdirectory under --output-path.
--timeout¶
Maximum wall-clock seconds for one game run. The default is 1200 seconds.
Replay options¶
When --output-path is provided, Rogue-Bench writes a game.sav file. That file stores the game name, seed, and keylog needed to reconstruct the run.
--input-path¶
Directory containing a game.sav file. This starts replay mode.
--input-path cannot be used with --output-path.
If the directory also contains playback.json from an agent run, replay shows the "actions" and "reasoning" panels from the original run.
Docker mode works for replay too:
--replay-speed¶
Seconds to wait between recorded keystrokes during visual replay.
--no-display¶
Skip visual replay and print final statistics as JSON.