OCuLink vs PCIe for LLMs… The Result I Didn’t Expect

The video compares GMK Tech’s Evo T1 mini PC with OCuLink external GPU expansion to the Beink GTI 15 system using a PCIe Gen 5 dock for running large language models, finding that despite PCIe’s higher bandwidth, real-world LLM inference speed differences are minimal. It concludes that the Evo T1 offers a more portable and tidy AI solution with decent performance, while the Beink GTI 15 provides greater power and flexibility but at the cost of bulk and complexity.

The video explores the performance and expandability of mini PCs, focusing on GMK Tech’s Evo T1 and comparing it with other machines, including the Beink GTI 15 docked system. Mini PCs like the Evo T1 strike a balance between portability and desktop-level input/output options, making them ideal for AI workloads, especially local generative AI inference. A standout feature in GMK Tech’s mini PCs is the inclusion of an OCuLink port, which allows for external GPU expansion—a feature not commonly found in desktops or laptops. This port enables users to connect external GPUs for enhanced AI processing power, offering a unique blend of portability and expandability.

The video dives into GPU performance for large language models (LLMs) using LM Studio, testing various models such as Gemma 34B and GPT OSS 20B. The Evo T1’s integrated Intel ARC 140T iGPU shows decent performance but generally lags behind Apple’s M1, M2, and M3 MacBook Airs in shorter prompts, though it excels in longer prompt scenarios. The AMD-based Evo X2 and Framework desktop with Strix Halo chips outperform the Evo T1’s iGPU, especially on larger models. Sparse models like GPT OSS 20B exhibit different behavior, where the iGPU sometimes outperforms discrete GPUs on shorter prompts due to reduced data transfer overhead.

Expandability is a key focus, with the video comparing the OCuLink dock used by GMK Tech to the PCIe Gen 5 dock used by the Beink GTI 15. While PCIe Gen 5 offers significantly higher bandwidth (31.5 GB/s) compared to OCuLink’s PCIe 4x4 (7.9 GB/s), the real-world impact on LLM inference speed is minimal. The Beink system supports larger GPUs and more flexible power supply options but results in a bulkier, messier setup. In contrast, the GMK Tech system offers a cleaner, more portable solution with some power and GPU size limitations.

Performance tests with an Nvidia RTX 5060 GPU connected via OCuLink show a substantial improvement over the iGPU, with token generation speeds increasing from around 17 tokens per second to over 100 tokens per second. The video also compares Vulkan and CUDA APIs, finding CUDA slightly faster but both delivering similar performance. Interestingly, for sparse models, the iGPU can sometimes outperform the discrete GPU on short prompts due to lower data transfer overhead, but discrete GPUs dominate on longer prompts thanks to their higher parallel throughput.

In conclusion, the video finds that while PCIe Gen 5 docks offer higher theoretical bandwidth and slightly better performance, the difference in LLM inference speed between PCIe and OCuLink setups is relatively small. The GMK Tech Evo T1 with OCuLink provides a neat, portable, and versatile mini PC solution for AI workloads, whereas the Beink GTI 15 with PCIe Gen 5 dock offers more power and flexibility at the cost of portability and tidiness. Users should weigh their priorities between portability, expandability, and raw performance when choosing between these setups.