Ollama Gets a Turbo Update

artesia · 25 July 2025 15:01

Ollama’s new app update introduces a user-friendly, chat-like interface that supports multiple models, retrieval-augmented generation with file inputs, and a “turbo mode” for accessing powerful cloud-hosted models like Kimmy K2 without compromising privacy. With flexible usage plans and a hybrid local-cloud approach, the update significantly enhances accessibility and functionality for users exploring advanced AI models.

artesia · 25 July 2025 15:21

The video covers the recent updates from Ollama, showcased at the ICML conference in Vancouver, where the company celebrated its second birthday. Ollama has evolved significantly in just two years, and to mark this milestone, they launched a new app that enhances how users interact with their models. Previously, users mainly accessed Ollama through a menu bar interface with limited functionality, but the new app offers a more comprehensive and user-friendly experience, allowing access to multiple models within a single interface.

One of the key features of the new Ollama app is its chat-like interface, similar to ChatGPT, where users can interact with various models. The app supports “thinking” tags, which visually indicate when the model is processing a query. Beyond simple chatting, the app also supports retrieval-augmented generation (RAG) capabilities, enabling users to drag and drop PDFs, images, or other files to provide context for the model’s responses. This makes it easier to work with complex documents or multimedia inputs directly within the app.

The update also introduces a significant improvement in model accessibility through what Ollama calls “turbo mode.” This mode allows users to run larger, more powerful models hosted in the cloud, such as the Kimmy K2 model, which offers fast, streaming token outputs. This hybrid approach means users can benefit from both local models and cloud-based models without needing to manage their own GPUs or API setups. Importantly, Ollama emphasizes privacy by not storing conversations in the cloud, and users can manage their usage through a credit system tied to their Ollama account.

To access turbo mode, users need to create an Ollama account and can start with a free plan that offers a rolling quota of 10,000 tokens per week, with options to upgrade to a Pro plan for more extensive use. The company’s goal with this offering is to fill gaps for users who want to experiment with larger open models without turning it into a major commercial venture. Alongside Kimmy K2, users also have access to other advanced models like the Quen 3 Mixture of Experts, and they can upload their own models for specialized tasks.

Overall, the Ollama update represents a major step forward in making advanced AI models more accessible and easier to use, especially for those who prefer not to work with command-line tools. The new app combines local and cloud-based models in a sleek interface, supports diverse input types, and offers flexible usage plans. This release not only celebrates Ollama’s growth but also sets the stage for future enhancements, making it a compelling option for users interested in exploring open-source AI models quickly and efficiently.