O1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights)

artesia · 5 December 2024 23:27

The video analyzes OpenAI’s recently launched 01 and 01 Pro modes, highlighting that while they show improvements in certain areas, such as mathematics and coding, they do not significantly outperform the standard 01 model and have limitations in creative writing and basic reasoning tasks. The presenter questions the value of the $200 subscription for the Pro mode, suggesting users should manage their expectations regarding the capabilities of these new AI models.

artesia · 5 December 2024 23:47

OpenAI has recently launched the 01 and 01 Pro modes, with CEO Sam Altman claiming these are the smartest AI models available. Accessing the Pro mode requires a subscription of $200 per month, while existing ChatGPT Plus subscribers at $20 per month will have access to the 01 system but not the Pro mode. The video discusses the implications of this pricing and the limitations of the models, emphasizing that staying on the lower tier means users won’t benefit from the latest advancements in AI.

The video presents benchmark performance results for both 01 and 01 Pro modes, highlighting that while they show improvements in mathematics, coding, and PhD-level science questions, they still do not match the capabilities of professional mathematicians or PhD students. The Pro mode does not represent a fundamentally different model but rather aggregates answers from the 01 model to enhance reliability. However, this aggregation may not significantly improve performance, as the Pro mode sometimes underperformed in certain tests compared to the standard 01 model.

The presenter analyzed the 49-page system card for the 01 model, noting that while it performed well in some persuasive tasks, it struggled in creative writing compared to GPT-4. The 01 model was found to be more persuasive than human participants in a Reddit-based evaluation, but its performance declined in other areas, such as writing tweets. The video also mentions that the Pro mode was not specifically evaluated in the system card, suggesting it may not offer substantial improvements over the standard model.

In independent testing, the 01 Pro mode scored lower than expected on basic human reasoning tasks, raising questions about its effectiveness. The video highlights specific examples where the Pro mode failed to provide accurate answers, contrasting its performance with that of other models like Claude. The presenter emphasizes that while the Pro mode may be beneficial for tasks requiring reliability, it does not significantly outperform the standard 01 model in many areas.

Finally, the video discusses the potential for future updates, hinting at the possibility of a GPT-4.5 release. The presenter expresses skepticism about the value of the $200 subscription for the Pro mode, given the current performance levels of the models. Overall, while acknowledging some advancements, the video suggests that users should temper their expectations regarding the capabilities of the new models and remain vigilant about their limitations.