Google's New FREE AI Model Can Run on Your PC!

Google’s new 12 billion parameter Gemma 4 AI model offers a powerful yet accessible option for running advanced language tasks locally on mid-range hardware, including PCs with 16 GB RAM and even Raspberry Pi devices. Balancing strong reasoning capabilities and manageable resource requirements, this dense model bridges the gap between smaller and larger versions, making sophisticated AI more widely available without needing high-end GPUs.

The video discusses Google’s release of a new 12 billion parameter variant of the Gemma 4 large language model, which can run locally on a PC with just 16 GB of RAM. This new model fills a gap between the smaller 2 billion and 4 billion parameter versions and the much larger 26 billion and 31 billion parameter models. The 12 billion parameter model is a dense model, meaning it does not use mixture of experts techniques, and has a file size of about 7.6 GB, making it more accessible for users with mid-range hardware, including PCs with CPUs only, GPUs with 16 GB VRAM, and even Raspberry Pi devices.

In terms of hardware requirements, the smaller Gemma 4 models need between 6 to 8 GB of RAM or VRAM, while the largest models require up to 32 GB of VRAM, typically found in high-end GPUs like the RTX 5090. The new 12 billion parameter model requires about 10 GB of RAM, striking a balance between capability and accessibility. Performance-wise, the smallest model runs at 278 tokens per second, while the 12 billion parameter model achieves around 110 tokens per second, offering a good compromise between speed and power. It can also run on devices without dedicated GPUs, such as the Jetson Thor, i7 mini PCs, and Raspberry Pi 5, albeit at slower speeds.

The video also evaluates the model’s capabilities using various reasoning and comprehension questions. The 12 billion parameter model can solve complex problems like the hourglass timing question and correctly interpret nuanced language in word problems, outperforming the smaller models. However, since these are 4-bit quantized versions of the models, some occasional errors occur, especially on edge cases. The model demonstrates strong reasoning and language understanding, making it suitable for a wide range of tasks without requiring massive computational resources.

Additionally, the presenter tested the model’s knowledge by asking it to write about historical topics such as Paris in 1883 and New York in 1901. Despite having no internet access, the model generated detailed and coherent essays based on its training data, though minor factual inaccuracies were noted. This highlights the model’s extensive stored knowledge and ability to produce informative content from memory, making it a valuable tool for offline use and various applications.

In conclusion, the 12 billion parameter Gemma 4 model offers a compelling middle ground between smaller, faster models and larger, more capable but resource-intensive ones. Its ability to run on modest hardware like a PC with 16 GB RAM or even a Raspberry Pi makes advanced AI more accessible to a broader audience. The presenter encourages viewers to share their thoughts and invites them to subscribe for more content, emphasizing the model’s practical utility and impressive performance for local AI applications.