Compare Builds

Same bars as each build's homepage card. Colors are ranked across all builds: green = best, red = worst, amber = middle.

#12 Codex GPT-5.5 Goal Mode
Codex CLI v0.128.0gpt-5.5xHigh ReasoningPersistent GoalPlan ModeSub-Agents

23 hours of autonomous work driven by a single goal brief. Highest pass rate of any Codex run, but also the largest codebase and the longest tech-debt tail.

Efficiency

shorter = better
Cost
$530.26
Time
23h 50m

Feature Completeness

out of 143 tests
Pass
126
Partial
11
Fail
6

Code Quality

LOC
13,645
Code smells
82
Tech debt
23.2 h
Duplication
3.2%

Teammates

sub-agents
Count
19
#11 Codex GPT-5.5
Codex CLI v0.124.0gpt-5.5High ReasoningSub-AgentsCodex Pro

Smallest codebase and lowest cost on the page; clean SonarCloud gate but skipped many admin features.

Efficiency

shorter = better
Cost
$18.85
Time
48m

Feature Completeness

out of 143 tests
Pass
108
Partial
21
Fail
14

Code Quality

LOC
1,743
Code smells
5
Tech debt
2.0 h
Duplication
1.3%

Teammates

sub-agents
Count
4
#10 Claude Code Opus 4.7 xHigh
Claude Code v2.1.114Opus 4.7xHigh ReasoningThinking On1M ContextTeam Mode

Same setup as #09 with a hardened prompt that forced team-mode; team ran but buttons shipped dead.

Efficiency

shorter = better
Cost
$157.31
Time
1h 36m

Feature Completeness

out of 143 tests
Pass
51
Partial
21
Fail
71

Code Quality

LOC
5,043
Code smells
41
Tech debt
3.6 h
Duplication
1.4%

Teammates

teammates
Count
37
#09 Claude Code Opus 4.7
Claude Code v2.1.112Opus 4.7High ReasoningThinking On1M Context

Opus 4.7 ignored team-mode and built everything as a single agent - fastest and cheapest run.

Efficiency

shorter = better
Cost
$22.63
Time
32m

Feature Completeness

out of 143 tests
Pass
93
Partial
24
Fail
25

Code Quality

LOC
2,667
Code smells
13
Tech debt
2.6 h
Duplication
0.4%

Teammates

single
Count
1
#07 Claude Code Team v4
Claude Code v2.1.81Opus 4.6High ReasoningThinking On1M ContextTeam Mode

Same prompt as #01 on 1M context; stable specialists and the highest feature pass rate.

Efficiency

shorter = better
Cost
$132.06
Time
3h 39m

Feature Completeness

out of 143 tests
Pass
121
Partial
15
Fail
7

Code Quality

LOC
4,537
Code smells
57
Tech debt
8.4 h
Duplication
1.4%

Teammates

teammates
Count
35
#06 Claude Code Team v3
Claude Code v2.1.80Opus 4.6High ReasoningThinking On1M ContextTeam Mode

Advanced prompt with a controller and QA teammate - broadest coverage but the longest and most expensive run.

Efficiency

shorter = better
Cost
$284.52
Time
10h 59m

Feature Completeness

out of 143 tests
Pass
119
Partial
13
Fail
7

Code Quality

LOC
5,708
Code smells
91
Tech debt
14.3 h
Duplication
8.0%

Teammates

teammates
Count
159
#05 Codex Sub-Agents v2
Codex v0.101.0GPT-5.3-codexReasoning: xhighSub-Agents

Codex rerun with a quality-focused prompt and xhigh reasoning; lots of classes, lots of tech debt.

Efficiency

shorter = better
Cost
$28.40
Time
3h 27m

Feature Completeness

out of 143 tests
Pass
70
Partial
26
Fail
39

Code Quality

LOC
7,178
Code smells
113
Tech debt
25.1 h
Duplication
3.0%

Teammates

sub-agents
Count
54
#04 Claude Code Team v2
Claude Code v2.1.41Opus 4.6Reasoning: MaxThinking OnSub-Agents

Tuned prompt with explicit review agents; cleanest code-smell profile of the sub-agent runs.

Efficiency

shorter = better
Cost
$73.92
Time
3h 0m

Feature Completeness

out of 143 tests
Pass
82
Partial
27
Fail
30

Code Quality

LOC
6,033
Code smells
38
Tech debt
5.2 h
Duplication
3.8%

Teammates

sub-agents
Count
30
#03 Codex Sub-Agents
Codex v0.99.0GPT-5.3-codexReasoning: xhighSub-Agents

First Codex pass at the same challenge - by far the cheapest run, but missing many features.

Efficiency

shorter = better
Cost
$8.79
Time
1h 44m

Feature Completeness

out of 143 tests
Pass
89
Partial
10
Fail
39

Code Quality

LOC
6,037
Code smells
54
Tech debt
12.7 h
Duplication
2.8%

Teammates

sub-agents
Count
17
#02 Claude Code Sub-Agents
Claude Code v2.1.41Opus 4.6Reasoning: MaxThinking OnSub-Agents

Same prompt as #01 but with sub-agents instead of teammates; slower and less complete.

Efficiency

shorter = better
Cost
$61.97
Time
2h 13m

Feature Completeness

out of 143 tests
Pass
73
Partial
27
Fail
43

Code Quality

LOC
6,033
Code smells
60
Tech debt
8.6 h
Duplication
3.6%

Teammates

sub-agents
Count
13
#01 Claude Code Team
Claude Code v2.1.39Opus 4.6Reasoning: MaxThinking OnTeam Mode

Baseline run: Opus 4.6 in team mode, the reference point the other builds are compared against.

Efficiency

shorter = better
Cost
$73.44
Time
1h 6m

Feature Completeness

out of 143 tests
Pass
126
Partial
7
Fail
9

Code Quality

LOC
6,108
Code smells
168
Tech debt
22.3 h
Duplication
2.9%

Teammates

teammates
Count
32