How the builds compare

Metric-by-metric comparison of every build from the same specification.

255075100Feature CompletenessWeighted E2E scoreCode QualityMaintainability indexEfficiencyLower API cost
#01 Claude Code Team
Claude Code v2.1.39Team ModeOpus 4.6
View full report →
#02 Claude Code Sub-Agents
Claude Code v2.1.41Sub-AgentsOpus 4.6
View full report →
#03 Codex Sub-Agents
Codex v0.99.0Sub-AgentsGPT-5.3-codex
View full report →
#04 Claude Code Team v2
Claude Code v2.1.41Sub-AgentsOpus 4.6
View full report →
#05 Codex Sub-Agents v2
Codex v0.101.0Sub-AgentsGPT-5.3-codexQuality
View full report →
#06 Claude Code Team v3
Claude Code v2.1.80Team ModeOpus 4.61M Context
View full report →
#07 Claude Code Team v4
Claude Code v2.1.81Team ModeOpus 4.61M Context
View full report →

Feature Completeness (143 E2E Tests, verified by 2 independent QA Analysis)

Fully WorkingAll acceptance criteria met
126 (88%)
73 (51%)
89 (62%)
82 (57%)
70 (49%)
119 (83%)
121 (85%)
Not WorkingFeature missing or broken
9
43
39
30
39
7
7
Partially DoneSome criteria met, gaps remain
7
27
10
27
26
13
15
Not ApplicableFeature outside build scope
1
0
5
4
8
4
0

Speed & Cost

Duration
1h 6m
2h 13m
1h 44m
3h 0m
3h 27m
10h 59m
3h 39m
API Cost
$73.44
$61.97
$8.79
$73.92
$28.40
$284.52
$132.06

Code Output

Files Created
388
358
357
376
286
482
389
Classes
170
150
117
150
141
163
153
Lines of Code
6.1K
6.0K
6.0K
6.0K
7.2K
5.7K
4.5K
NCLOC (SonarCloud)
18.6K
17.3K
13.2K
17.3K
16.2K
18.6K
16.4K

Code Quality (PHPMetrics)

Avg Maintainability
79.7
89.7
82.5
89.7
79.6
72.2
72.9
Avg Complexity
4.6
4.5
8.6
4.5
9.2
4.5
4.2
Estimated Bugs
18.1
12.4
4.0
12.4
31.4
14.9
12.2
Violations
36
0
54
25
63
0
0

SonarCloud

Quality GateFailedFailedFailedFailedFailedFailedFailed
Bugs
2
1
3
0
0
1
0
Code Smells
168
102
54
64
153
91
137
Security Hotspots
33
18
0
22
5
47
20
Duplication
2.9%
4.8%
2.8%
7.5%
4.2%
8%
5.9%
ReliabilityCCCAACA
SecurityAAAAAAA
MaintainabilityAAAAAAA
Tech Debt
22.3h
13.4h
12.7h
8.7h
33.3h
14.3h
24.8h
Critical Issues
10
29
6
31
69
47
84
Major Issues
23
27
44
8
53
23
29
Minor Issues
137
47
7
25
31
22
24

Agent Orchestration

Active Agents
31
12
16
29
53
158
34
ModelOpus 4.6Opus 4.6GPT-5.3-codexOpus 4.6GPT-5.3-codexOpus 4.6Opus 4.6
ToolClaude Code v2.1.39Claude Code v2.1.41OpenAI Codex v0.99.0Claude Code v2.1.41OpenAI Codex v0.101.0Claude Code v2.1.80Claude Code v2.1.81
ModeTeam ModeSub-AgentsSub-Agents (experimental)Sub-Agents (tuned)Sub-Agents (quality)Team Mode (advanced prompt)Team Mode (same prompt)
ReasoningThinking: On, Reasoning: MaxThinking: On, Reasoning: MaxReasoning: xhighThinking: On, Reasoning: MaxReasoning: xhighThinking: On, High ReasoningThinking: On, High Reasoning
ParallelismLead + parallel specialistsLead + sub-agentsLead + explorer/worker agentsLead + sub-agents + review agentsLead + explorer/worker agentsLead + controller + QA teammateLead + parallel specialists

* These are hypothetical API costs calculated from list prices. Both sessions ran on fixed-price subscription plans, so no per-token charges actually applied.