Ollama's Newest Release and Model Breakdown

In the latest video, Matt Williams highlights Ollama’s new release, which introduces significant features like easier model unloading and improved performance for existing models, particularly benefiting users on Docker. He reviews several models, sharing his experiences and encouraging viewers to test them while inviting feedback on desired features.

In the latest video, Matt Williams discusses Ollama’s newest release, highlighting two significant features that have been highly anticipated by users. One of these features is particularly exciting, as it addresses a long-standing request from the community. Matt, who has a background with the Ollama team, emphasizes his enthusiasm for the project and encourages viewers to visit the Ollama website for downloads and updates. He provides guidance on how to update Ollama on various operating systems, ensuring that users can easily access the latest version without losing their existing models.

The video delves into the specifics of the new release, particularly focusing on the changes and improvements made since the last version. While there are no entirely new models included in this release, the updates enhance the performance of existing models. Notably, users running Ollama in Docker on Windows or Linux will experience a significant speed increase, with models starting up approximately five seconds faster than before. This improvement is especially beneficial for those utilizing Docker for their AI applications.

One of the standout features introduced in this release is the ability to unload models from memory more easily. Previously, users had to make API calls to manage memory, which could be cumbersome, especially for non-developers. Now, users can simply execute the command “ollama stop” followed by the model name to unload it from memory. This simplification is a welcome change for many users who have been seeking a more straightforward method to manage their models and optimize memory usage.

Matt also reviews several models that have been released since the last update, including Solar Pro Preview, which is touted as a highly intelligent model designed to fit on a single GPU. However, he cautions viewers to be skeptical of such claims and encourages them to test models themselves. Other models discussed include Quen 2.5, which has improved knowledge and performance, and bespoke models like Minich Che, which can only respond with yes or no answers based on specific prompts.

Finally, Matt shares his experiences with various models, including Mistal Small, which excels in translation and summarization tasks, and Reader LM, designed to convert HTML to markdown. While he finds some models perform well, he also notes limitations, such as Reader LM’s disappointing output. Overall, Matt expresses excitement about the new features and improvements in Ollama, inviting viewers to share their thoughts and desired features in the comments.