run panoptes-49fdaed9
2026-06-04 17:42:29·strategy: all·demo_all.duckdb
items
8
judge calls
143
UQ results
56
cost
$0.549
150.8k tokens
cost by judge
total
$0.549
claude-haiku$0.129claude-sonnet$0.412gpt-4o-mini$0.0081
score distribution (point pass, by judge)
items
| item | family | scores | UQ | |
|---|---|---|---|---|
| HE/22 | code | claude-haiku 0.800 claude-sonnet 0.950 gpt-4o-mini 1.000 | 7 blob(s) | drill |
| HE/23 | code | claude-haiku 1.000 claude-sonnet 1.000 gpt-4o-mini 1.000 | 7 blob(s) | drill |
| HE/24 | code | claude-haiku 0.200 claude-sonnet 0.500 gpt-4o-mini 0.800 | 7 blob(s) | drill |
| HE/25 | code | claude-haiku 0.500 claude-sonnet 0.950 gpt-4o-mini 0.800 | 7 blob(s) | drill |
| HE/26 | code | claude-haiku 1.000 claude-sonnet 1.000 gpt-4o-mini 0.800 | 7 blob(s) | drill |
| HE/27 | code | claude-haiku 1.000 claude-sonnet 1.000 gpt-4o-mini 1.000 | 7 blob(s) | drill |
| HE/28 | code | claude-haiku 1.000 claude-sonnet 1.000 gpt-4o-mini 1.000 | 7 blob(s) | drill |
| HE/29 | code | claude-haiku 1.000 claude-sonnet 1.000 gpt-4o-mini 1.000 | 7 blob(s) | drill |