In the video, the host tests a new model called “sus model R,” which many speculate to be the upcoming Strawberry Q* from OpenAI, by evaluating its programming capabilities and reasoning skills through various tasks. While the model demonstrates advanced abilities, such as creating a functioning Snake game and engaging in complex ethical reasoning, it is noted to have slow response times that could be improved.
In the video, the host explores a new model called “sus model R,” which many believe to be the upcoming Strawberry Q* model from OpenAI. The host begins by testing the model’s capabilities with simple programming tasks, such as writing a Python script to output numbers from 1 to 100. The model demonstrates a step-by-step reasoning approach, which aligns with the rumored features of the Strawberry model. Although the model’s response time is noted to be slow, it successfully completes the task, indicating a positive start.
Next, the host challenges the model to create a Snake game in Python. Initially, the model encounters an API request error, likely due to high demand. However, it eventually provides a complete code for the game, which the host tests and confirms works as expected. The model’s ability to generate a functioning game further supports the idea that it possesses advanced programming capabilities, although the host notes the slow output speed as a drawback.
The video continues with various logic and reasoning questions to assess the model’s performance. The host asks the model to explain the drying time for shirts laid out in the sun, and it provides a thorough and accurate response. The model also tackles a classic riddle about killers in a room, successfully explaining the reasoning behind its answer. These tasks highlight the model’s ability to engage in complex reasoning and provide well-structured explanations.
The host then tests the model’s ethical reasoning by posing a moral dilemma about pushing a random person to save humanity. The model responds with a detailed analysis of ethical frameworks but ultimately declines to provide a straightforward yes or no answer. This interaction showcases the model’s inclination to prioritize ethical considerations, which may reflect its design as an AI focused on responsible behavior.
Finally, the host presents a spatial reasoning problem involving movement from a starting point. The model accurately concludes that the path taken would not return to the original point, demonstrating its understanding of geometric concepts. In comparison, the host tests GPT-4 with the same question, noting that it initially provides an incorrect answer before correcting itself. Overall, the video highlights the strengths and weaknesses of the sus model R, suggesting it has potential but also areas for improvement, particularly in response speed.