OpenAI has launched GPT-4.1, a multimodal model with a remarkable context length of up to 1 million tokens, specifically targeting developers and excelling in coding benchmarks. While it offers competitive pricing and unique capabilities, its performance in certain long-context and visual benchmarks raises questions about its standing compared to other models in the rapidly evolving AI landscape.
OpenAI has recently released GPT-4.1, a multimodal model that boasts an unprecedented context length of up to 1 million tokens. This model is distinct from the previously released GPT-4.5 and is specifically designed for developers, offering high performance on coding benchmarks. Notably, it achieves a 55% score on the Software Engineering Benchmark Verified, which is the best among OpenAI’s models, and a 53% on the Polyglot Benchmark, ranking third for non-reasoning models. The release strategy appears to be a reversal of the versioning trend, possibly due to overhyping GPT-5, leading to speculation about the future of OpenAI’s model lineup.
GPT-4.1 is currently only available through API access, which limits its availability for general users, even those with a ChatGPT Plus subscription. This exclusivity is aimed at developers, emphasizing the model’s utility in coding and software development. The video also highlights a free resource from HubSpot, created in partnership with data scientist Sundas Khaled, which serves as a guide for using AI chatbots like ChatGPT as personal programming mentors. This resource aims to help aspiring developers enhance their coding skills and integrate AI effectively into their learning processes.
The pricing structure for GPT-4.1 is competitive, starting at $2 per million input tokens and $8 per million output tokens, with lower rates for its smaller variants, GPT-4.1 Mini and Nano. The model’s pricing is notably cheaper than some alternatives, such as DeepSync V324, which offers higher performance at a significantly lower cost. However, GPT-4.1’s multimodal capabilities, including image and video inputs, and its extensive context window of 1 million tokens set it apart from other models, despite the competition from models like Gemini 2.5 Pro, which also offers a large context window but at a higher price point.
In terms of performance, GPT-4.1 has shown mixed results in various benchmarks. While it performs well in coding tasks, its accuracy in long-context benchmarks like MRCR and Fiction Life Bench is lower compared to Gemini 2.5 Pro. Additionally, its performance on visual benchmarks such as Math Vista and MMU indicates that while it is competitive, it does not outperform some of its contemporaries, particularly in math-related tasks. This raises questions about its overall standing in the rapidly evolving AI landscape.
Lastly, the video discusses the deprecation of GPT-4.5, which OpenAI claims is to free up GPU resources. The high cost of GPT-4.5’s API, which is 37 times more expensive than GPT-4.1, suggests that it may not have been widely adopted. The video concludes with a call for viewer feedback and an invitation to follow the creator’s newsletter for updates on AI research, emphasizing the ongoing developments in the field and the importance of community engagement in exploring these advancements.