admin · agent eval

Synthetic-agent harness. Each run spawns a fresh Claude agent with the production MCP server wired in, asks it to complete one of the /examples tasks, and captures every tool call, token, and cost. See docs/PLAN_agent_eval_harness.md.

Triggered Status Tasks Model(s) k Cost Time Detail