M3 Ultra vs M4 Max LLM speed

The video compares the performance of the Mac Studio with the M3 Ultra chip and the Mac Studio with the M4 Max chip while running the same language model, revealing that the M4 Max consistently processes tokens slightly faster than the M3 Ultra. Despite the M4 Max’s superior speed and ability to handle larger models, the presenter notes that the performance difference may not justify its higher price for all users.

In the video, the presenter sets up a comparison between two powerful machines: the Mac Studio with the M3 Ultra chip and the Mac Studio with the M4 Max chip. Both machines are configured to run the same language model using LM Studio, allowing for a direct performance comparison. The presenter notes that while both machines are clones in terms of setup, they feature different chips that may impact their processing speeds.

As the models are loaded, the presenter observes the initial loading times and begins testing the speed of the language model by inputting simple phrases. The results show a slight difference in performance, with the M4 Max chip yielding marginally faster token processing speeds compared to the M3 Ultra. The presenter mentions that the performance difference aligns with the Geekbench scores, which indicate that the M4 Max has a higher multicore score than the M3 Ultra.

The presenter continues to test the models by inputting various statements and measuring the tokens processed per second. The results fluctuate, but the M4 Max consistently shows a slight edge in speed, processing up to 53 tokens per second in one instance. Despite the differences, the presenter emphasizes that the performance gap is not substantial, suggesting that both machines are highly capable.

Additionally, the presenter highlights the cost disparity between the two machines, noting that the M4 Max is significantly more expensive than the M3 Ultra. However, the M4 Max has the advantage of being able to run larger models, which could be a crucial factor for users requiring more extensive processing capabilities. The presenter hints at having larger models available for testing in future videos.

In conclusion, the video provides an insightful overview of the performance differences between the M3 Ultra and M4 Max chips in a practical application. While the M4 Max demonstrates superior speed, the presenter suggests that the differences may not justify the higher price for all users. Viewers are encouraged to stay tuned for more tests, particularly involving larger models that could further showcase the capabilities of the M4 Max chip.