Claude Opus 4.5 vs. Google Gemini 3: Design & build an app

The video compares Claude Opus 4.5 and Google Gemini 3 by having both AI models design and build the same invoicing app, highlighting their strengths and weaknesses in planning, front-end design, and backend implementation. It concludes that while neither model is definitively superior, success depends more on the user’s ability to guide and leverage these tools effectively through strong product instincts and iterative development.

In this video, the creator compares two recently released Frontier AI models, Claude Opus 4.5 and Google Gemini 3, by putting them through a practical test: designing and building the same invoicing app using both models. The creator emphasizes that a single test cannot provide a definitive verdict on any model, as different models may excel at different phases of product development and individual workflows vary. The focus here is on planning, designing, and rapidly building a new product from scratch, which aligns with the creator’s personal priorities.

The process begins with setting up two separate codebases using Ruby on Rails, Inertia, React, Tailwind CSS, and shad CN components. Both projects include a product overview and two main prompts: one for front-end design and another for backend implementation. The creator uses a specialized “front-end design” skill developed by Anthropic to enhance Claude Opus 4.5’s design capabilities and applies the same skill to Gemini 3 to ensure a fair comparison. Both models are tested on their ability to ask clarifying questions during the planning phase, which is crucial for successful project outcomes.

Claude Opus 4.5 demonstrates strong planning skills by asking relevant clarifying questions and producing a detailed implementation plan. Its front-end design is visually appealing, mobile responsive, and mostly functional, though it misses dark mode in this run and has minor UI nitpicks. Gemini 3 also asks thoughtful planning questions and produces a consistent, simpler design that is mobile responsive but has some spacing and layout issues. Notably, Gemini 3 initially produces a non-working front-end due to hallucinated components but recovers after debugging, showing transparency about its mistakes.

For the backend implementation, both models are tasked with wiring up full functionality. Claude Opus 4.5 delivers a mostly functional app with working navigation, client and invoice management, and search features, though it has some bugs like not saving invoice line items correctly. Gemini 3’s backend initially has more issues, such as non-functional search and errors when creating clients or invoices, but after debugging, it manages to fix some errors and get basic functionality working. The creator notes that these issues highlight the importance of thorough upfront planning and iterative refinement in spec-driven development.

In conclusion, the creator finds that while neither model is drastically superior in this quick test, both are highly capable and continue to improve, with their differences becoming less significant over time. The real advantage lies in the user’s ability to effectively direct and leverage these AI tools through strong product instincts, architectural decisions, and skillful prompting. The video ends with an encouragement to focus on developing essential skills for builders in this new AI-driven era, rather than fixating solely on which model to use.