OpenAI has launched a new model series called “01,” which includes the 01 preview and 01 Mini models, both featuring a 128k context window and demonstrating significant improvements in logical reasoning tasks compared to GPT-4. While excelling in reasoning capabilities, the 01 models show limited advancements in other areas, indicating that the pursuit of artificial general intelligence (AGI) continues.
OpenAI has introduced a new model series called “01,” moving away from the GPT naming convention. This series includes the 01 preview model and a more affordable 01 Mini model, both featuring a 128k context window. The 01 preview is significantly more expensive than GPT-4, while the 01 Mini is slightly cheaper. The 01 preview model is slower, taking 20 to 30 seconds to generate responses, but it demonstrates remarkable performance improvements, particularly in logical reasoning tasks, outperforming previous models like GPT-4 in various benchmarks.
The performance of the 01 models is particularly impressive in challenging reasoning tasks. For instance, while GPT-4 only solved 13% of problems in the International Mathematics Olympiad qualifying exam, the unreleased 01 model achieved an 83% success rate, marking a 70% increase. The 01 preview model also showed substantial improvements, scoring around 56%. In other benchmarks, such as college mathematics and formal logic, the 01 models exhibited significant jumps in accuracy, indicating a strong focus on reasoning capabilities.
Despite these advancements, the 01 models are not all-encompassing solutions. They excel in reasoning and logical tasks but show minimal improvements in areas like English literature. This suggests that while OpenAI has made strides in specific domains, the quest for artificial general intelligence (AGI) remains ongoing. The models are primarily designed for reasoning and problem-solving rather than being versatile across all tasks.
The breakthrough behind the 01 models lies in their “chain of thought” approach combined with reinforcement learning. This method allows the model to reflect on its generated responses, improving its reasoning capabilities before presenting answers. The private chain of thought process is not fully disclosed, but it reportedly generates a vast number of tokens for each query, contributing to the model’s enhanced performance. Users currently have limited access, with a cap of 30 messages per week.
Looking ahead, OpenAI researchers are exploring the potential of extending the thinking time of models to improve performance further. This shift towards “inference time scaling” suggests that allowing models to think longer could yield better results, challenging previous assumptions about training and inference. While there are concerns about the validity of the benchmarks and the potential for evaluation maxing, the initial results from the 01 models indicate a promising direction for future AI development.