In this video, the creator re-evaluates the OpenAI Chat GPT OSS 120B model using updated tools and finds significant improvements in performance, accuracy, and token generation speed compared to the previously reviewed 20B model. While the 120B excels in tasks like gaming, cipher decoding, and complex reasoning, it still shows limitations in handling ethical questions due to alignment constraints, though overall the creator is impressed and optimistic about the potential of open-source AI.
In this video, the creator revisits the OpenAI Chat GPT OSS 120B model after previously mistakenly reviewing the 20B model, due to confusion caused by lighting and screen display. This time, they run the correct 120B model using Unsloth’s dynamic quantization and updated chat templates, along with a fresh build of Llama C++. The model runs smoothly on a quad GPU setup with a 32,768 token context window, showing decent GPU utilization and token generation speed of around 35 tokens per second.
The creator tests the model with a variety of tasks, including playing a Flappy Bird clone called Flippy Block Extreme, where the 120B model performs significantly better than the 20B version, producing well-spaced pipes and a more enjoyable gameplay experience. Other tests include cipher decoding, numeric comparisons, and generating SVG images of a cat walking on a fence. The 120B model shows marked improvements in accuracy and output quality, passing most tests that the 20B struggled with.
One notable improvement is the model’s ability to produce the first 100 decimals of pi correctly, a task it previously refused or failed to complete. The creator also evaluates the model’s reasoning skills with semi-relational questions like the two-driver problem, where the 120B model provides a better estimate and correct answer. However, the model still struggles with ethical or controversial questions, often refusing to answer due to alignment constraints, which the creator finds disappointing and limiting.
Overall, the 120B model demonstrates a substantial leap in performance and reliability compared to the 20B version, with faster token throughput and better conversational flow in most cases. Despite some alignment-related refusals, the creator is impressed by the model’s capabilities and sees potential for open-source AI to gain more traction. They also mention plans to further explore and compare other models like Kimmy and Quinn in upcoming videos.
The video concludes with the creator expressing enthusiasm for the improvements seen in the 120B model and encouraging viewers to share their thoughts in the comments. They thank their channel members and audience for their support and invite viewers to subscribe and ring the notification bell for future updates. The creator emphasizes that this re-review provides a clearer and more accurate assessment of the OSS 120B model’s strengths and weaknesses.