Ollama has launched a new app that enhances user interaction by providing a comprehensive chat interface, supporting local AI models, file-based context input, and a new turbo mode for fast, cloud-based large model access while maintaining user privacy. This update simplifies AI model usage beyond command-line tools, offering flexible features like multi-file support, customizable settings, and the ability to upload custom models, marking a significant advancement for the Ollama community.
The video covers the recent updates from Ollama, showcased at the ICML conference in Vancouver, where the company celebrated its second birthday. Ollama, known for its local AI models, has introduced a new app that significantly enhances user experience. Previously, users interacted with Ollama primarily through a menu bar interface, mainly for updates. The new app offers a more comprehensive and user-friendly chat interface, allowing users to access various AI models directly within the app. Models not already downloaded will be fetched automatically, streamlining the process.
One of the standout features of the new Ollama app is its ability to function as a retrieval-augmented generation (RAG) system. Users can drag and drop PDFs, images, and other files into the app, which the AI then uses as context for answering questions. This capability was demonstrated with slides from a mechanistic interpretability presentation, showing the model’s ability to process and understand complex documents. The app supports multiple files and offers settings to customize context size and model locations, enhancing flexibility for different use cases.
A significant advancement with this release is the introduction of “turbo mode,” which allows users to access more powerful models running in the cloud. This addresses a previous limitation where Ollama was confined to smaller, local models. Turbo mode enables fast, streaming responses from large models like Kimmy K2, providing a cloud-based experience without the need for users to manage GPUs or API endpoints themselves. Importantly, conversations remain private and are not stored in the cloud, balancing performance with user privacy.
To use turbo mode, users must create an account on Ollama’s platform, where they can select a free plan with limited credits or upgrade to a Pro plan. The pricing details are still being finalized, but the company aims to offer this as a helpful tool rather than a major revenue source. Alongside Kimmy K2, users can access other advanced models such as the Quen 3 Mixture of Experts, and even upload their own models for specialized tasks involving images or documents.
Overall, the new Ollama app represents a significant step forward, making it easier for users who prefer not to use command-line tools to engage with local and cloud-based AI models. The app runs on Ollama’s proprietary engine rather than relying solely on Llama CPP, promising ongoing updates and new features. This release offers a quick and versatile interface for experimenting with AI models, bridging the gap between local convenience and cloud power, and is a promising development for the Ollama community.