The video reviews the Raspberry Pi AI HAT+ 2, an add-on for the Raspberry Pi 5 that uses the Hailo 10H AI accelerator and dedicated RAM to enable efficient, offline AI tasks like running large language and vision models. While slightly slower than the Pi 5’s CPU for some tasks, the HAT+ 2 offloads AI workloads, keeps the system responsive, and offers user-friendly tools for interacting with AI models, making it a valuable upgrade for edge AI projects.
The video introduces the Raspberry Pi AI HAT+ 2, a hardware extension designed to bring advanced AI capabilities, such as large language models (LLMs) and vision models, to the Raspberry Pi 5. The presenter, Gary Sims, discusses the growing potential for edge devices—systems that can operate independently of the internet—to process vision, audio, and language tasks using onboard AI accelerators. The AI HAT+ 2 is positioned as a significant step toward making the Raspberry Pi a powerful, self-contained AI device, capable of tasks like speech-to-text, image recognition, and running LLMs without relying on cloud services.
At the core of the AI HAT+ 2 is the Hailo 10H AI accelerator, which is a notable upgrade from the previous Hailo 8 chip. The Hailo 10H features a direct DDR interface and comes with 8GB of LPDDR4 RAM, the same type used in the Raspberry Pi 5. This dedicated RAM allows large models to be loaded directly onto the HAT, reducing the load on the Pi’s main CPU and memory. The accelerator offers up to 40 TOPS (trillions of operations per second) at INT4 precision, making it particularly well-suited for running quantized LLMs and vision models efficiently.
Installation of the AI HAT+ 2 is straightforward, involving mounting the board onto the Raspberry Pi 5 and connecting it via a PCIe ribbon cable. The HAT also includes a heatsink for cooling. Once installed, users need to set up the appropriate software, including the Hailo RT CLI for basic interaction and the Hailo Model Zoo GenAI, which provides a collection of optimized pre-trained models and example applications. The HAT supports a REST API for querying models, and the presenter demonstrates using both command-line tools and a custom Python script called Pyama to interact with the LLMs in a user-friendly way.
For those preferring a graphical interface, the video also showcases Open Web UI, a self-hosted web-based chat interface that operates entirely offline. This interface allows users to select models and interact with them in a familiar chat format, further simplifying the process of leveraging AI capabilities on the Raspberry Pi. Both Pyama and Open Web UI make it easy to run queries and receive responses from the LLMs loaded onto the AI HAT+ 2, highlighting the flexibility and accessibility of the setup.
Performance testing reveals that while the AI HAT+ 2 may be slightly slower than running models directly on the Pi 5’s CPU for some tasks, its main advantage is offloading compute and memory usage from the main board. This allows the Raspberry Pi to remain responsive and available for other tasks while the AI HAT handles intensive AI workloads. For example, running a 1.5 billion parameter model yields around 8 tokens per second on the HAT versus 10 on the CPU, but with the benefit of freeing up system resources. The AI HAT+ 2 is priced at $130 and represents a significant upgrade for those looking to add advanced, offline AI capabilities to their Raspberry Pi projects.