Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research

The video reviews OpenAI’s Deep Research system, highlighting its strengths in retrieving obscure knowledge while noting significant limitations in common sense reasoning and the need for a subscription. The presenter compares its performance with DeepSeek R1 and Google’s Gemini Deep Research, finding Deep Research generally superior but prone to hallucinations, and reflects on the broader implications of AI advancements for the workforce.

In a recent video, the presenter discusses OpenAI’s newly released system called Deep Research, which is based on their most powerful language model, referred to as O3. The presenter has conducted extensive testing on this system across 20 use cases and compared its performance with DeepSeek R1 and Google’s Gemini Deep Research. Notably, OpenAI chose the same name as Google for their product, which raised eyebrows. The presenter highlights that while they are impressed with Deep Research, there are significant caveats regarding its capabilities and the need for a subscription to access it.

The video delves into the performance of Deep Research on various benchmarks, particularly focusing on Humanity’s Last Exam and the Guia Benchmark. The presenter notes that Deep Research excels in retrieving obscure knowledge, achieving a notable performance increase from previous models. However, human performance still significantly outstrips the AI’s capabilities, with humans scoring 92% compared to Deep Research’s 67-73%. This disparity raises questions about the AI’s utility as a reliable assistant for more nuanced tasks.

The presenter also shares their experience testing Deep Research on a benchmark they created, which assesses common sense and spatial reasoning. Unfortunately, the model struggled with these tasks, often asking multiple clarifying questions instead of providing direct answers. This behavior was seen as both a potential sign of advanced AI and a source of frustration for the presenter. Despite its shortcomings in common sense reasoning, Deep Research demonstrated impressive capabilities in retrieving specific information from a newsletter.

In comparing Deep Research with DeepSeek R1 and Gemini Deep Research, the presenter found that Deep Research generally outperformed DeepSeek in most tests, although it frequently hallucinated information. DeepSeek, while free and open-source, did not exhibit the same level of performance but was less frustrating in its questioning approach. The presenter also noted that Gemini’s Deep Research performed poorly in their tests, failing to retrieve relevant information effectively.

The video concludes with reflections on the rapid advancements in AI technology and the implications for various professions. The presenter acknowledges the potential for AI to replace certain jobs, particularly in white-collar sectors, while also expressing a sense of urgency about the need for humans to adapt to these changes. Despite the challenges posed by AI, the presenter remains optimistic about their own role and the importance of human insight in navigating the evolving landscape of technology.