Gemma 4 Has Landed!

Google has released Gemma 4, a family of open-licensed multimodal AI models integrating native audio, vision, reasoning, and function calling, available in both powerful workstation and efficient edge versions suitable for diverse applications. With advanced features like long chain-of-thought reasoning, large context windows, and improved encoders, Gemma 4 offers flexible, high-performance AI solutions accessible via Hugging Face and Google Cloud for easy deployment and customization.

Google has just released Gemma 4, a new family of four multimodal AI models that integrate native audio, vision, reasoning, and function calling capabilities. What sets Gemma 4 apart is its licensing under the Apache 2.0 license, allowing users to freely modify, fine-tune, and deploy the models commercially without restrictions. This open licensing is a significant shift from previous models that had more limiting terms, making Gemma 4 a highly attractive option for developers and businesses seeking powerful, flexible AI models.

The Gemma 4 lineup is divided into two tiers: workstation models and edge models. The workstation tier includes a 31 billion parameter dense model and a 26 billion parameter mixture of experts (MoE) model, which activates only about 4 billion parameters at a time for efficiency. These models are designed for local coding assistance, multilingual tasks, and running on small servers. The edge tier consists of smaller, highly efficient models (E2B and E4B) optimized for running on devices like phones and Raspberry Pis, with native support for audio processing, including speech recognition and translation.

A major advancement in Gemma 4 is its built-in long chain-of-thought reasoning across multiple modalities—text, images, and audio—enabling more sophisticated and accurate outputs. Unlike previous models that bolted on audio or function calling, Gemma 4 integrates these features at the architecture level, improving performance and usability. The models also support function calling natively, facilitating multi-turn agentic workflows and tool use, which enhances their capability in complex interactive applications.

The smaller edge models have seen significant improvements in audio and vision encoders, with reduced parameter sizes and better responsiveness, making them ideal for on-device AI assistants that prioritize privacy and low latency. The workstation models boast a massive 256K token context window and improved vision encoders that handle aspect ratios and multi-image inputs better, supporting advanced tasks like OCR and document understanding. These enhancements position Gemma 4 as a versatile solution across a wide range of AI applications.

Finally, the models are available on Hugging Face and Google Cloud, with support for serverless deployment on GPUs like the Nvidia RTX Pro 6000 via Cloud Run. This accessibility, combined with the open license and strong base models, opens up many possibilities for fine-tuning and customization. The presenter plans to explore fine-tuning and other use cases in future videos, encouraging viewers to engage and suggest topics, signaling a promising future for Gemma 4 in the AI community.