I Made Antigravity and Codex Build the Same App (RAW RESULTS)

merefield · 7 March 2026 13:00

The video compares Gemini (Google) and Codex (OpenAI) by having them build the same full-stack competitor analysis app, revealing that Gemini is faster and more user-friendly but less thorough, while Codex is slower yet produces a more secure and production-ready application. Ultimately, Codex delivers a more complete and functional app, highlighting the trade-offs between speed, usability, and robustness in AI coding assistants.

merefield · 7 March 2026 13:20

In this video, the creator sets up a head-to-head challenge between two AI coding assistants: Gemini (Google) and Codex (OpenAI). Both are given the exact same prompt: build a full-stack competitor analysis app from scratch, using Supabase for the backend, Firecrawl for web scraping, and deploying live on Vercel. The test is designed to reflect a realistic user scenario, not a perfectly engineered context, to see how each model’s structural biases and user experience play out. The Atlas framework, which structures the software development lifecycle into phases like architect, trace, link, assemble, and stress test, is used to guide both AIs through the build process.

Gemini quickly moves through the planning and architecture phases, correctly identifying the problem and laying out a clear data schema and tech stack. It chooses OpenAI for AI analysis and asks for API keys before deploying to Vercel. Codex, on the other hand, is much slower, spending a significant amount of time on backend planning and extensive production-grade testing, even though the creator did not request such thoroughness. This results in Codex lagging far behind Gemini in terms of speed, but potentially producing a more robust backend.

When it comes to user experience, Gemini is more conversational and user-friendly, while Codex is described as less intuitive, especially when dealing with deployment and server connections. Claude, another AI, is used as a judge to objectively assess the backend and frontend codebases produced by both models. Gemini initially fails to deploy a functional frontend due to a missing dependency and a misplacement of files, but after some debugging, it produces a visually appealing and animated dashboard. However, it misses critical functionality like user authentication, which was specified in the prompt.

Codex, despite its slow pace, delivers a more complete and secure application. It includes proper authentication, a well-structured backend, and passes most of the security and database checks conducted by Claude. The backend review shows Codex outperforming Gemini in schema design, row-level security, query patterns, and data architecture. Gemini’s backend, while more sophisticated in some ways, lacks security and proper indexing, making it less reliable for production use.

In the final test, Codex’s app successfully performs the competitor analysis workflow, providing a comprehensive AI-driven teardown of a target company and its competitors. Gemini’s app, despite several retries and fixes, fails to deliver a fully functional workflow. The creator concludes that while Gemini is faster and more user-friendly, Codex ultimately produces a more reliable and production-ready app, albeit at the cost of speed and some user experience. The experiment highlights the importance of context engineering and the variability in AI model outputs, even when using the same frameworks and prompts.