Better Than GPT4? Smaug 70b Dominates Benchmarks

In the video, Abacus AI introduces Smaug 70b as a 70 billion parameter fine-tuned version of Llama 3, claiming it to be better than GPT 4 Turbo. The model is tested against a smaller 7 billion parameter quantized version locally. The benchmarks show that Smaug 70b outperforms Llama 3 and GPT 4 Turbo across various tests, including Mt bench and arena hard scores. The model is successfully run on and a quantized version is tested locally on LM Studio.

Different tasks are assigned to both the larger and smaller versions of the model to test their capabilities. They are tasked with simple tasks like outputting numbers, creating a game of Snake, solving math problems, and answering logic puzzles. The larger model struggles with some tasks, such as answering a tricky logical puzzle about killers in a room, while the smaller model performs better in certain scenarios like calculating drying time for shirts and solving word problems.

The video provides comparisons between the larger and smaller models’ performances in various tasks. The larger model fails in tasks like word count and reasoning puzzles, while the smaller model excels in tasks like predicting where a ball is placed and generating sentences ending with a specific word. The results suggest that the smaller, locally run quantized model performs comparably or better than the larger model in some scenarios.

The video concludes with a summary of the model's performance in different tasks and highlights the potential of running high-quality models locally. The overall assessment suggests that the smaller quantized model shows promising performance and offers a viable option for running complex models efficiently.

