How Fast Will Your New Mac Run LLMs?

artesia · 3 February 2024 00:00

The video discusses the performance of large language models (LLMs) on Apple Silicon Macs, comparing the M1 Max and M3 Max models, with the presenter expressing skepticism about the value of upgrading due to only marginal performance improvements despite the higher cost. After testing the M1 Max with Llama 2, they conclude that it meets their needs for experimentation and development, suggesting that investing in a powerful graphics card may be a better option for those entering machine learning.

artesia · 3 February 2024 00:20

In the video, the presenter discusses the performance of large language models (LLMs) on Apple Silicon Macs, particularly focusing on the transition from the M1 series to the newer M3 models. The presenter, who owns an M1 Max, is contemplating whether to upgrade to an M3 Max and is seeking clarity on the performance differences. They reference benchmarks from the Llama CPP project, which provides insights into the performance of various Apple Silicon models when running LLMs, specifically Llama 2.

The presenter shares their experience with their current M1 Pro, noting that it performs as expected with a token generation rate of around 36 tokens per second. They highlight the significant performance boost seen with the M1 Max, which doubles the output due to increased CPU capabilities. However, when comparing the M1 Max to the M3 Max, the presenter expresses skepticism about the value of upgrading, given the high cost and only marginal performance improvements.

As they delve deeper into the benchmarks, the presenter notes that while the M3 Max offers some enhancements, such as improved text generation speeds, the cost-to-performance ratio does not seem favorable for their needs. They emphasize that the M1 series already provided a substantial leap in performance over previous Intel architectures, making the M3 upgrade less compelling for casual users or developers who do not require cutting-edge specifications.

After unboxing their new M1 Max, the presenter runs tests with Llama 2 to evaluate its performance. They report achieving a token generation rate of 58 tokens per second, which aligns closely with the benchmark expectations. The tests include various prompts, and the results indicate that the new machine performs well, providing satisfactory value for the price paid.

In conclusion, the presenter reflects on their decision to upgrade, feeling confident that the M1 Max meets their needs for experimentation and development without the necessity of investing in the more expensive M3 Max. They suggest that for those entering machine learning, investing in a powerful graphics card might be a more beneficial route than simply upgrading to the latest Mac model. The video wraps up with an invitation for viewers to like and subscribe, as well as to check out more content related to Llama.