ChatGPT O1 Preliminary test comparison with previous model test videos

artesia · 21 September 2024 15:00

In a recent video, the creator shares preliminary results from testing the new ChatGPT O1 model, noting improvements over previous versions but highlighting ongoing issues with understanding context and requirements. While the O1 model performed better in certain tasks, such as setting up an HTTP server, it still made critical errors in other areas, leading the creator to express frustration with its usability and a commitment to further testing.

artesia · 21 September 2024 15:20

In a recent video, the creator shares preliminary results from testing the new ChatGPT O1 model from OpenAI, comparing it to previous models. While the O1 model shows improvements over its predecessors, it still exhibits significant issues, particularly in understanding requirements and context. The creator emphasizes that they will conduct further testing in the future to explore the model’s capabilities more thoroughly.

One of the tests involved comparing the costs of a photo-browsing app and TikTok, where the O1 model performed better than earlier versions. However, it still failed to grasp the importance of TikTok’s algorithm, misestimating its value significantly. This highlights a persistent problem with the model’s ability to understand the nuances of different applications and their functionalities.

In another test focused on prime number calculations using the elliptical curve method, the O1 model demonstrated some progress by recognizing an existing Python library. However, it still made critical errors in the import statement and misinterpreted the output, leading to incorrect conclusions about prime numbers. Despite these mistakes, the creator notes that the O1 model’s awareness of the library is a step forward compared to previous models.

The creator also tested the O1 model’s performance in setting up an HTTP server, where it showed better results than any other AI models previously tested. Although there was one regression related to file returns, the model quickly corrected itself upon receiving error feedback. The creator plans to challenge the O1 model further with more complex tasks and different programming languages, indicating a commitment to exploring its limits.

Despite the improvements, the creator expresses frustration with the O1 model’s usability, particularly the cumbersome process of copying and pasting code. They critique OpenAI’s marketing claims, suggesting that the model does not live up to the assertion that it performs at the level of a “very smart PhD student.” The creator concludes by acknowledging the need for further exploration of the O1 model’s capabilities and limitations, promising more updates in the future.