Actual Tech - IBM Granite LLM Model on Raspberry Pi 5

In the video, Eli the Computer Guy demonstrates running IBM’s Granite 3.3 2B large language model locally on a Raspberry Pi 5 using the Olama framework, creating a voice-activated AI device that transcribes speech via Google’s API and displays AI-generated responses in a web browser. He highlights the impressive performance of the Granite model on limited hardware, discusses architectural choices like cloud-based speech recognition, and encourages viewers to explore similar edge AI projects.

Certainly! Here’s a five-paragraph summary of the video “Actual Tech - IBM Granite LLM Model on Raspberry Pi 5” featuring Eli the Computer Guy:

In this episode, Eli demonstrates a project where he creates a voice-activated IoT device powered by artificial intelligence running locally on a Raspberry Pi 5. The goal is to allow users to speak to the device, have their speech transcribed, and then processed by an AI model, with the results displayed in a web browser. The browser auto-refreshes every two seconds to show new queries and AI responses, making the interaction nearly real-time.

For the AI backend, Eli uses the Olama LLM framework, which he praises for its effectiveness, and specifically employs IBM’s Granite 3.3 2B model. He notes that this model performs impressively well on the Raspberry Pi 5, especially considering the device’s limited hardware resources. Eli compares Granite to other small models like Microsoft’s Phi-3 and Tiny Llama, stating that Granite provides better and more reliable results in this context.

The setup involves a small microphone connected to the Raspberry Pi, which surprisingly outperforms a larger, more professional-looking microphone. Speech recognition is handled by Google’s API, which converts spoken words into text. This text is then sent to the Olama framework running the Granite model, which generates a response that is displayed in the web browser. Eli demonstrates the system by asking playful questions, showing that the AI responds quickly and accurately, albeit with short answers due to a prompt injection limiting responses to 20 words or less. This constraint helps keep system requirements low and responses fast.

Eli points out that while the AI processing is done locally, the speech recognition still relies on Google’s cloud services, as he has not yet found a suitable local speech-to-text solution. He emphasizes the importance of considering such architectural choices, especially for privacy and independence from external services. He also mentions that with further development, the system could be expanded to trigger function calls or events based on specific keywords, making it a versatile platform for IoT and automation projects.

Finally, Eli shares some challenges he faced, such as attempting to run similar projects on a Raspberry Pi Zero W (version 1), which proved impractical due to its outdated ARMv6 architecture and lack of software support. He concludes by inviting viewers to participate in upcoming Silicon Dojo classes, where they can experiment with pushing AI to the edge using Raspberry Pi devices. He also reminds viewers about the costs involved in running such projects and encourages donations to support his work.