State of AI 2025: GPT-5 can't beat o3, robots coming into your house and fake Veo 3.1 rumors

artesia · 10 October 2025 03:08

The video highlights the development of Figure03, a fully autonomous humanoid robot designed for home use, and clarifies false rumors about the VO3.1 AI model release while discussing the latest AI benchmark performances and industry trends. It also covers key personnel moves, advancements in reinforcement learning, growing AI adoption in business, and AI’s expanding role in scientific discovery amid geopolitical and market dynamics.

artesia · 10 October 2025 03:32

The video begins with an introduction to Figure03, a humanoid robot developed by Brett Adcock, founder of Figure Robotics. Adcock has two missions: building the robotic future and challenging Apple after a dispute involving engraving his name on an Apple product. Figure03 is notable for being fully autonomous, unlike many robots that are teleoperated. Designed for home use, it features a fabric layer covering its hardware to make it more suitable and less intimidating for residential environments. The robot runs on an onboard vision-language model called Helix, eliminating the need for cloud connectivity. Figure Robotics plans to release Figure03 in select homes next year, marking a significant step forward in consumer robotics.

The video also addresses rumors about the upcoming release of VO3.1, clarifying that some circulating information is false. Logan Kilpatrick from Google DeepMind has denied claims that VO3.1 will be released exclusively on Vadu AI or imminently. While the release is expected eventually, the exact timing remains uncertain, and viewers are advised to treat rumors with caution. This segment highlights the ongoing excitement and speculation surrounding new AI model releases in the industry.

Next, the video discusses the ARC AGI benchmark, where GPT-5 Pro currently holds the highest score among commercially available large language models, outperforming Grok 4 by a small margin. However, GPT-5 Pro is significantly more expensive per task compared to other models. Interestingly, the 03 Preview model from late 2024 scored even higher but at a much greater cost. The video also highlights a researcher named Jeremy Burman, who achieved an 80% score on the ARC AGI leaderboard using an evolutionary approach to improve Grok 4’s performance. This method involves iterative testing and refinement of candidate solutions, similar to techniques used by other AI research groups.

The video touches on some personnel changes in the AI field, noting that Chinese researcher Shunyu Yao left Anthropic to join DeepMind, partly due to disagreements with Anthropic’s anti-China statements. This move reflects broader geopolitical tensions influencing AI research and talent mobility. Additionally, the video summarizes key points from the State of AI Report 2025, emphasizing that OpenAI remains the leader but faces growing competition, especially from China. Reinforcement learning with verifiable rewards (RLVR) is highlighted as a crucial technique improving AI performance in math and coding tasks, contributing to AI systems increasingly challenging human experts in competitions.

Finally, the video covers broader AI industry trends, including rapid advancements in protein language models and increasing AI adoption by businesses, with 44% of U.S. companies now paying for AI services. Despite concerns about a potential AI bubble, some experts, including investor Steve Eisman, believe the current AI investment surge is backed by substantial cash reserves and infrastructure development rather than speculative hype. The video concludes by noting AI’s growing role in scientific discovery, such as helping grandmasters improve chess strategies and contributing to published research, signaling AI’s expanding impact across multiple domains.