run panoptes-d636a93f
2026-06-04 17:54:12·strategy: bandit·demo_bandit.duckdb
items
30
judge calls
300
UQ results
120
cost
$0.950
305.9k tokens
cost by judge
total
$0.950
claude-sonnet$0.722claude-haiku$0.203gpt-4o-mini$0.025
score distribution (point pass, by judge)
items
| item | family | scores | UQ | |
|---|---|---|---|---|
| HE/0 | code | claude-sonnet 0.950 | 2 blob(s) | drill |
| HE/1 | code | claude-haiku 1.000 | 2 blob(s) | drill |
| HE/10 | code | claude-sonnet 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/11 | code | claude-sonnet 0.800 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/12 | code | claude-haiku 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/13 | code | claude-sonnet 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/14 | code | claude-haiku 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/15 | code | claude-haiku 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/16 | code | claude-haiku 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/17 | code | claude-haiku 1.000 gpt-4o-mini 0.800 | 5 blob(s) | drill |
| HE/18 | code | claude-sonnet 0.950 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/19 | code | gpt-4o-mini 1.000 | 2 blob(s) | drill |
| HE/2 | code | claude-sonnet 1.000 | 2 blob(s) | drill |
| HE/20 | code | gpt-4o-mini 0.800 | 2 blob(s) | drill |
| HE/21 | code | gpt-4o-mini 1.000 | 2 blob(s) | drill |
| HE/22 | code | claude-haiku 0.800 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/23 | code | claude-haiku 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/24 | code | gpt-4o-mini 0.800 | 2 blob(s) | drill |
| HE/25 | code | claude-haiku 0.500 gpt-4o-mini 0.800 | 5 blob(s) | drill |
| HE/26 | code | claude-haiku 1.000 gpt-4o-mini 0.800 | 5 blob(s) | drill |
| HE/27 | code | claude-sonnet 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/28 | code | claude-haiku 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/29 | code | claude-sonnet 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/3 | code | claude-sonnet 1.000 | 2 blob(s) | drill |
| HE/4 | code | claude-haiku 0.500 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/5 | code | claude-sonnet 1.000 | 2 blob(s) | drill |
| HE/6 | code | claude-sonnet 1.000 | 2 blob(s) | drill |
| HE/7 | code | claude-sonnet 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |
| HE/8 | code | claude-sonnet 1.000 gpt-4o-mini 0.800 | 5 blob(s) | drill |
| HE/9 | code | claude-sonnet 1.000 gpt-4o-mini 1.000 | 5 blob(s) | drill |