Ministral 8B: MistralAI just released NEW 3b and 8b Agentic Models!

artesia · 16 October 2024 20:06

MistralAI has released two new small language models, Mistral 3B and Mistral 8B, which are designed for edge computing and boast impressive benchmarks, including support for a 128,000 token context length. While the models show potential for various applications, concerns arise regarding their limited access due to a commercial license and the unavailability of model weights for local use.

artesia · 16 October 2024 20:27

In a recent video, the host discusses the release of two new small language models by MistralAI, named Mistral 3B and Mistral 8B. These models are part of the growing trend of small language models (SLMs) that are gaining traction in the AI community. The host highlights that the benchmarks for these models are impressive, showcasing capabilities that were previously thought to be limited to larger models. The release coincides with Zifer’s launch of their Zoma 2 model, suggesting a competitive landscape for edge-optimized models and agentic applications.

MistralAI positions these models as the best edge models available, celebrating the anniversary of their previous significant release, Mistral 7B. The new models are designed for on-device computing and edge use cases, with a focus on local privacy and efficiency. The host notes that while MistralAI has not explicitly mentioned agentic use cases, there is potential for these models to be utilized in such applications, especially given Mistral’s internal work on agentic frameworks.

The Mistral 3B and 8B models boast impressive features, including support for a 128,000 token context length and memory-efficient inference techniques. These models are intended for various applications, such as on-device translation, local analytics, and autonomous robotics. The host emphasizes the importance of low latency and high efficiency in these models, which can be tuned for specific tasks, including orchestrating workflows and function calling.

While the benchmarks for Mistral 3B show it outperforming competitors like Llama 3.2 and Gemma 22B, the performance of Mistral 8B is more mixed. The host points out that Mistral 8B shows significant gains in multilingual interaction but does not quite match the knowledge and common sense reasoning of its predecessors. Additionally, the models are designed to work in conjunction with larger models, allowing for efficient task routing and input parsing across multiple contexts.

However, there are concerns regarding the release of these models. The weights for Mistral 3B and 8B are not yet available for local use, and they are currently accessible only through Mistral’s APIs. Furthermore, the models are no longer released under the permissive Apache 2.0 license, instead adopting a commercial license, which may limit access for hobbyists and local AI enthusiasts. Despite these setbacks, the host remains optimistic about the future of MistralAI and its commitment to advancing small, capable models for agentic tasks.