all runs
Evaluation runs
Every `panoptes eval` invocation produces one run row. Smoke-test runs are listed but drill-down pages focus on the substantive ones (≥10 items).
panoptes-44c4e9b3
strategy: all
2026-06-04 18:45:08 · demo_calibration.duckdb
items
16
calls
32
judges
2
cost
$0.196
claude-sonnetgpt-4o
panoptes-e86ef9e3
strategy: all
2026-06-04 18:12:07 · demo_calibration.duckdb
items
34
calls
764
judges
2
cost
$4.74
claude-sonnetgpt-4o
panoptes-d636a93f
strategy: bandit
2026-06-04 17:54:12 · demo_bandit.duckdb
items
30
calls
300
judges
3
cost
$0.950
claude-haikuclaude-sonnetgpt-4o-mini
panoptes-49fdaed9
strategy: all
2026-06-04 17:42:29 · demo_all.duckdb
items
8
calls
143
judges
3
cost
$0.549
claude-haikuclaude-sonnetgpt-4o-mini
panoptes-696da4d5
strategy: all
2026-06-04 16:31:35 · demo_all.duckdb
items
22
calls
396
judges
3
cost
$1.56
claude-haikuclaude-sonnetgpt-4o-mini
panoptes-d7da27b1
strategy: all
2026-05-11 16:52:17 · m1_exit.duckdb
items
5
calls
5
judges
1
cost
$0.0075
claude-sonnet