PANOPTES
Overview
Background
Runs
Judges
Calibration
Methods
Summary
← all runs
run
panoptes-d7da27b1
2026-05-11 16:52:17
·
strategy:
all
·
m1_exit.duckdb
items
5
judge calls
5
UQ results
0
cost
$0.0075
1.3k tokens
cost by judge
total
$0.0075
claude-sonnet
$0.0075
score distribution (point pass, by judge)
items
item
family
scores
UQ
HE/0
code
claude-sonnet
0.422
—
drill
HE/1
code
claude-sonnet
0.839
—
drill
HE/2
code
claude-sonnet
0.541
—
drill
HE/3
code
claude-sonnet
0.474
—
drill
HE/4
code
claude-sonnet
0.818
—
drill