OpenAI’s new GPT-5.4 Pro model has surpassed competitors in benchmarks for real-time web browsing, advanced math, and professional tasks, demonstrating unprecedented reasoning and problem-solving abilities but at a higher cost. While it excels in many areas and matches or exceeds human professionals in most tasks, its advanced capabilities also raise concerns about cybersecurity risks and the future accessibility of powerful AI systems.
OpenAI’s new GPT-5.4 Pro model has arrived and is making waves in the AI community for its impressive performance across a range of advanced benchmarks. Unlike previous models, GPT-5.4 Pro outperforms competitors like Claude Opus 4.6 and Google’s Gemini 3.1, even in areas where those models were expected to dominate, such as real-time web browsing and data retrieval. Notably, GPT-5.4 Pro achieved an 89.3% score on the Browse Comp benchmark, which tests an AI’s ability to pull in real-time data from the web—a domain where Google was expected to lead. However, this leap in capability comes at a significant cost, with GPT-5.4 Pro being much more expensive to use than its rivals, raising concerns about the future of AI pricing and accessibility.
One of the most remarkable achievements of GPT-5.4 Pro is its dominance in the Frontier Math benchmark, which consists of research-level math problems designed to resist AI solutions. While earlier models scored as low as 2%, GPT-5.4 Pro now leads the field, even solving a 20-year-old unsolved problem for a mathematician, drawing comparisons to AlphaGo’s famous “move 37” moment. This breakthrough signals that AI is not just incrementally improving but is crossing qualitative thresholds in reasoning and problem-solving, especially in domains previously thought to be out of reach for machines.
Beyond math, GPT-5.4 Pro has set new records on real-world professional benchmarks. On the Apex Agents benchmark, which simulates complex professional tasks like creating financial models, legal analysis, and slide decks, GPT-5.4 Pro became the first model to surpass 50%, doubling the previous best score in just six weeks. This rapid progress highlights the accelerating pace of AI development and its potential to disrupt white-collar jobs, as these benchmarks are specifically designed to measure how close AI is to replacing junior professionals in fields like banking, consulting, and law.
OpenAI’s internal GDP-Val benchmark further demonstrates GPT-5.4 Pro’s capabilities, showing that it matches or exceeds human professionals in 83% of tasks across 44 occupations. The model completes these tasks about 100 times faster and cheaper than humans, though it still lacks the iterative, back-and-forth context-building that real jobs require. Improvements in creative writing and coding have also been noted, with GPT-5.4 Pro now ranking near the top in creative writing benchmarks and demonstrating the ability to autonomously create and test complex software projects, including games and simulations.
However, the video also highlights some limitations and emerging risks. GPT-5.4 Pro underperforms on certain novel engineering problems, scoring lower than previous models on OpenAI’s internal OPQA benchmark. More significantly, the model’s advanced capabilities in cybersecurity have prompted OpenAI to classify it as “high risk,” as it can autonomously execute complex cyberattacks in simulated environments. This raises concerns about future models like GPT-6, which could reach a “critical” level of risk, potentially requiring stricter access controls such as ID verification. The rapid advancement of AI capabilities, especially in sensitive domains, suggests that society will soon face new challenges in balancing innovation, safety, and accessibility.