admin · agent eval
Synthetic-agent harness. Each run spawns a fresh Claude agent with
the production MCP server wired in, asks it to complete one of the
/examples tasks, and captures every tool call, token,
and cost. See docs/PLAN_agent_eval_harness.md.
Recent runs
| Triggered | Status | Tasks | Model(s) | k | Cost | Time | Detail |
|---|---|---|---|---|---|---|---|