Quickstart¶

Install¶

git clone https://github.com/chayapatr/impactbench
cd bench-py
uv sync
cp .env.example .env

Add your API keys to .env:

ANTHROPIC_API_KEY=...
OPENAI_API_KEY=...
DEEPINFRA_TOKEN=...   # optional
XAI_API_KEY=...       # optional

Run¶

# All phases, all target models, for the "mcab" benchmark
python main.py mcab all

# Individual phases
python main.py mcab gen_metrics
python main.py mcab gen_scenarios
python main.py mcab simulate gpt-4o
python main.py mcab evaluate gpt-4o
python main.py mcab aggregate

# Every benchmark × every target model
python main.py all

Run behavior (force, dry-run, concurrency) is set in config.yaml, not via flags. Use --config to point at a different config file.

Resuming

Every phase is cached per row. Re-running picks up where an interrupted run stopped. Set run.force: true in config.yaml to re-run a completed phase from scratch.