Compare Builds
Same bars as each build's homepage card. Colors are ranked across all builds: green = best, red = worst, amber = middle.
23 hours of autonomous work driven by a single goal brief. Highest pass rate of any Codex run, but also the largest codebase and the longest tech-debt tail.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
sub-agentsSmallest codebase and lowest cost on the page; clean SonarCloud gate but skipped many admin features.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
sub-agentsSame setup as #09 with a hardened prompt that forced team-mode; team ran but buttons shipped dead.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
teammatesOpus 4.7 ignored team-mode and built everything as a single agent - fastest and cheapest run.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
singleSame prompt as #01 on 1M context; stable specialists and the highest feature pass rate.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
teammatesAdvanced prompt with a controller and QA teammate - broadest coverage but the longest and most expensive run.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
teammatesCodex rerun with a quality-focused prompt and xhigh reasoning; lots of classes, lots of tech debt.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
sub-agentsTuned prompt with explicit review agents; cleanest code-smell profile of the sub-agent runs.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
sub-agentsFirst Codex pass at the same challenge - by far the cheapest run, but missing many features.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
sub-agentsSame prompt as #01 but with sub-agents instead of teammates; slower and less complete.
Efficiency
shorter = betterFeature Completeness
out of 143 testsCode Quality
Teammates
sub-agentsBaseline run: Opus 4.6 in team mode, the reference point the other builds are compared against.