PANOPTES
← all runs

run panoptes-44c4e9b3

2026-06-04 18:45:08·strategy: all·demo_calibration.duckdb
items
16
judge calls
32
UQ results
0
cost
$0.196
46.8k tokens
cost by judge
total
$0.196
claude-sonnet$0.141gpt-4o$0.055
score distribution (point pass, by judge)
items
itemfamilyscoresUQ
calib·HumanEval/34code
claude-sonnet
1.000
gpt-4o
1.000
drill
calib·HumanEval/35code
claude-sonnet
0.800
gpt-4o
0.500
drill
calib·HumanEval/36code
claude-sonnet
0.950
gpt-4o
0.800
drill
calib·HumanEval/37code
claude-sonnet
1.000
gpt-4o
0.800
drill
calib·HumanEval/38code
claude-sonnet
1.000
gpt-4o
0.500
drill
calib·HumanEval/39code
claude-sonnet
0.850
gpt-4o
0.800
drill
calib·HumanEval/40code
claude-sonnet
0.800
gpt-4o
1.000
drill
calib·HumanEval/41code
claude-sonnet
1.000
gpt-4o
1.000
drill
calib·HumanEval/42code
claude-sonnet
1.000
gpt-4o
1.000
drill
calib·HumanEval/43code
claude-sonnet
0.950
gpt-4o
1.000
drill
calib·HumanEval/44code
claude-sonnet
0.800
gpt-4o
1.000
drill
calib·HumanEval/45code
claude-sonnet
1.000
gpt-4o
1.000
drill
calib·HumanEval/46code
claude-sonnet
0.950
gpt-4o
1.000
drill
calib·HumanEval/47code
claude-sonnet
1.000
gpt-4o
1.000
drill
calib·HumanEval/48code
claude-sonnet
1.000
gpt-4o
0.800
drill
calib·HumanEval/49code
claude-sonnet
0.800
gpt-4o
0.800
drill