How the builds compare
Metric-by-metric comparison of every build from the same specification.
Feature Completeness (143 E2E Tests, verified by 2 independent QA Analysis)
Fully WorkingAll acceptance criteria met
126 (88%)
73 (51%)
89 (62%)
82 (57%)
70 (49%)
119 (83%)
121 (85%)
Not WorkingFeature missing or broken
9
43
39
30
39
7
7
Partially DoneSome criteria met, gaps remain
7
27
10
27
26
13
15
Not ApplicableFeature outside build scope
1
0
5
4
8
4
0
Speed & Cost
Duration
1h 6m
2h 13m
1h 44m
3h 0m
3h 27m
10h 59m
3h 39m
API Cost
$73.44
$61.97
$8.79
$73.92
$28.40
$284.52
$132.06
Code Output
Files Created
388
358
357
376
286
482
389
Classes
170
150
117
150
141
163
153
Lines of Code
6.1K
6.0K
6.0K
6.0K
7.2K
5.7K
4.5K
NCLOC (SonarCloud)
18.6K
17.3K
13.2K
17.3K
16.2K
18.6K
16.4K
Code Quality (PHPMetrics)
Avg Maintainability
79.7
89.7
82.5
89.7
79.6
72.2
72.9
Avg Complexity
4.6
4.5
8.6
4.5
9.2
4.5
4.2
Estimated Bugs
18.1
12.4
4.0
12.4
31.4
14.9
12.2
Violations
36
0
54
25
63
0
0
SonarCloud
Quality GateFailedFailedFailedFailedFailedFailedFailed
Bugs
2
1
3
0
0
1
0
Code Smells
168
102
54
64
153
91
137
Security Hotspots
33
18
0
22
5
47
20
Duplication
2.9%
4.8%
2.8%
7.5%
4.2%
8%
5.9%
ReliabilityCCCAACA
SecurityAAAAAAA
MaintainabilityAAAAAAA
Tech Debt
22.3h
13.4h
12.7h
8.7h
33.3h
14.3h
24.8h
Critical Issues
10
29
6
31
69
47
84
Major Issues
23
27
44
8
53
23
29
Minor Issues
137
47
7
25
31
22
24
Agent Orchestration
Active Agents
31
12
16
29
53
158
34
ModelOpus 4.6Opus 4.6GPT-5.3-codexOpus 4.6GPT-5.3-codexOpus 4.6Opus 4.6
ToolClaude Code v2.1.39Claude Code v2.1.41OpenAI Codex v0.99.0Claude Code v2.1.41OpenAI Codex v0.101.0Claude Code v2.1.80Claude Code v2.1.81
ModeTeam ModeSub-AgentsSub-Agents (experimental)Sub-Agents (tuned)Sub-Agents (quality)Team Mode (advanced prompt)Team Mode (same prompt)
ReasoningThinking: On, Reasoning: MaxThinking: On, Reasoning: MaxReasoning: xhighThinking: On, Reasoning: MaxReasoning: xhighThinking: On, High ReasoningThinking: On, High Reasoning
ParallelismLead + parallel specialistsLead + sub-agentsLead + explorer/worker agentsLead + sub-agents + review agentsLead + explorer/worker agentsLead + controller + QA teammateLead + parallel specialists
* These are hypothetical API costs calculated from list prices. Both sessions ran on fixed-price subscription plans, so no per-token charges actually applied.