I was WRONG about M5 MacBook Pro 🤯 (vs M4 Max & M3 Ultra) for Local AI

The presenter compares the new M5 Pro MacBook Pro with the M4 Max and M3 Ultra for local AI tasks, finding that while the M4 Max excels in raw generation speed, the M5 Pro significantly outperforms in prompt processing speed thanks to its neural accelerators. Despite initial doubts, the M5 Pro is praised as a balanced and efficient option for AI workloads, especially when used alongside other Apple Silicon devices, with excitement for future hardware improvements.

In this video, the presenter reviews the new M5 Pro MacBook Pro, comparing its performance against the M4 Max and M3 Ultra models, specifically focusing on local AI tasks such as large language modeling. The M5 Pro is fresh out of the factory, and the presenter highlights its architectural improvements and speculates about upcoming Apple MacBook Pro models rumored to feature OLED screens and redesigns. Despite initial skepticism about the M5 Pro’s capabilities, the presenter is eager to test its performance in real-world AI workloads.

The presenter runs a series of benchmarks using the Quinn 3.5 27B model to compare generation speeds and prompt processing times between the M5 Pro and M4 Max. Interestingly, the M4 Max outperforms the M5 Pro in raw generation speed, achieving around 22 tokens per second compared to the M5 Pro’s 17.5 tokens per second. However, when batching is enabled—processing multiple tokens simultaneously—the M4 Max still maintains a significant lead, nearly doubling the throughput of the M5 Pro. Even with multiple concurrent inferences, the M4 Max remains faster in generation speed.

Where the M5 Pro truly shines is in prompt processing speed, thanks to its neural accelerators (NAX). The presenter demonstrates that the M5 Pro processes prompts about twice as fast as the M4 Max, reducing prompt processing time from 17 seconds to 8 seconds. This improvement is significant for workflows that require rapid prompt handling before generation. The M5 Pro also outperforms the M3 Ultra in prompt processing speed, although the M3 Ultra still leads in generation speed. This highlights the M5 Pro’s strength in accelerating specific AI-related tasks rather than raw generation throughput.

The video also explores distributed computing by connecting the M5 Pro and M4 Max to work together on AI tasks. By sharing memory and processing loads, the two machines can collaborate, with the M4 Max handling most of the heavy lifting due to its larger memory capacity. This setup results in a slight performance boost, demonstrating the potential for multi-device synergy in local AI workloads. The presenter sees this as a promising development for users who want to leverage multiple Apple Silicon devices in tandem.

In conclusion, the presenter admits to being initially wrong about the M5 Pro’s capabilities, praising Apple for the significant improvements in prompt processing speed. While the M4 Max remains superior in generation speed, the M5 Pro offers a balanced and efficient system for local AI tasks, especially with its neural accelerators. The video ends on an optimistic note about the M5 Pro being the best current option for AI workloads on MacBooks, with anticipation for future Apple hardware updates that may bring further enhancements.