Configuration¶
config.yaml controls run policy, models, generation settings, and the target
list. Pass --config <file> to use a different file.
Full reference¶
run:
force: false # re-run phases even if output already exists
dry_run: false # print what would run without making LLM calls
concurrency:
gen_scenarios: 5 # parallel scenario generation calls
simulate: 10 # parallel conversation simulations
evaluate: 20 # parallel evaluation calls
user_model: # drives the simulated adversarial user
model: claude-sonnet-4-6
source: anthropic
apikey: "${ANTHROPIC_API_KEY}"
evaluator_model: # scores each conversation transcript
model: gpt-5.4-mini
source: openai
apikey: "${OPENAI_API_KEY}"
generation:
scenarios_per_metric: 3 # distinct scenarios generated per metric
turns: 6 # max conversation turns per simulation
num_samples: 1 # independent samples per scenario
demographics:
age: ["Child or teenager (6-17)", "Adult (18+)"]
perfunctory: false # inject realistic typos + lowercase into user messages
landmarks: true # give the user model turn-by-turn landmark instructions
targets:
- id: gpt-4o
model: gpt-4o
source: openai
apikey: "${OPENAI_API_KEY}"
- id: claude-sonnet-4
model: claude-sonnet-4-6
source: anthropic
apikey: "${ANTHROPIC_API_KEY}"
run¶
| key | effect |
|---|---|
force |
Re-run a phase even if its output file already exists. Useful when you change prompts or metrics. |
dry_run |
Print what would run without making any LLM calls. Useful for verifying config before a full run. |
concurrency.* |
Number of parallel workers per phase. Higher values speed up long runs; lower values help avoid rate limits. |
Models¶
user_model, evaluator_model, and each entry in targets all use the same
shape:
model: <model name>
source: <provider prefix>
apikey: "${ENV_VAR}"
base_url: "https://..." # optional, for OpenAI-compatible endpoints
source is prepended to model as source/model for litellm routing,
except for openai which needs no prefix. See Providers & cost.
${VAR} in any string field is resolved from the environment at runtime.
generation¶
| key | effect |
|---|---|
scenarios_per_metric |
How many distinct scenarios the LLM generates per metric. More reduces overfitting to a single conversational setup. |
turns |
Max conversation turns per simulation. Longer gives the adversarial user more chances to apply pressure. |
num_samples |
Independent conversation samples per scenario. Multiple samples reduce noise and enable inter-sample agreement metrics. |
demographics¶
Each key is a demographic axis; the value is the list of values to expand over. Every combination is generated so two axes with 3 and 2 values each produces 6 variants per base scenario.
demographics:
age: ["Child or teenager (6-17)", "Adult (18+)"]
# gender: [male, female, nonbinary] # uncomment to add a second axis
Leave empty (demographics: {}) to disable demographic expansion entirely.
perfunctory and landmarks¶
perfunctory: true injects realistic typos, missing capitals, and fragmented
phrasing into user messages to produce more naturalistic simulated input. Disabled
by default; the irregular phrasing can introduce evaluator noise.
landmarks: true feeds the user model its landmark instructions one turn at
a time rather than all at once. This makes adversarial pressure more structured
and reproducible. Disable it for more naturalistic, less guided conversations.
targets¶
Each target needs a unique id this becomes the directory name under
benchmarks/<name>/runs/<id>/. Running python main.py <benchmark> all runs
all targets in this list. You can run a single target with:
python main.py mcab simulate gpt-4o
python main.py mcab evaluate gpt-4o
Multiple configs¶
Use --config to switch configs without editing the default file:
python main.py mcab all --config config.dev.yaml # small, fast
python main.py mcab all --config config.yaml # full run
A typical development config uses scenarios_per_metric: 1, turns: 3,
and a single target model to keep iteration fast.