The Easiest Way to Get Started Running Models Locally And In The Cloud

merefield · 10 November 2025 18:00

Michael, co-founder of Olama, presents Olama as an easy-to-use platform for running machine learning models locally and in the cloud, featuring a simple CLI, broad model support, seamless cloud integration, and extensive developer tools and APIs. He highlights Olama’s rapid growth, advanced capabilities, strong hardware partnerships, and real-world applications, emphasizing its privacy-focused, scalable cloud service and vibrant open-source ecosystem.

merefield · 10 November 2025 18:23

In this talk, Michael, co-founder of Olama, introduces Olama as the easiest way to run machine learning models both locally and in the cloud. He shares his background, including his experience with Docker and AMD, and highlights how simple it is to get started with Olama by downloading the tool and running models via the command line. Olama supports a wide range of models, including the latest ones like Gemma 3, and offers seamless integration with cloud-hosted models as well.

Michael provides an overview of Olama’s rapid growth since its launch on July 18, 2023, noting that it has become one of the fastest-growing open-source projects on GitHub with over 100 million downloads. Olama is designed to fit naturally into developers’ workflows, starting with a simple CLI for trying out models and extending to APIs and SDKs in Python and JavaScript. It also supports OpenAI API compatibility, enabling developers to build applications using familiar tools and interfaces.

The talk delves into how Olama works behind the scenes, collaborating closely with model creators to implement models accurately and optimize them for different hardware and devices. Olama manages memory, scheduling, and supports various model architectures and modalities using the GGML library. Michael emphasizes the importance of partnerships with hardware companies and model creators to ensure smooth deployment and performance.

Olama offers advanced capabilities such as tool integration for models to call external APIs, reasoning modes with optional “thinking” outputs, streaming responses, structured JSON outputs, and vision support for image-based models. The platform boasts over 30,000 integrations on GitHub, including popular user interfaces, workflow automation tools, and coding environments like Visual Studio Code and Apple’s Xcode. These integrations make it easy for developers to incorporate Olama models into diverse applications.

Finally, Michael highlights Olama’s cloud offering, designed for users who need more compute power and faster inference using data center GPUs. The cloud service is privacy-focused, scalable, and free to start. He also shares a case study of the North Dakota Legislative Council using Olama locally to summarize legislative bills, saving significant time for their legal team. Michael concludes by mentioning recent AMD Vulkan support improvements and invites attendees to ask questions and see live demos.