Introducing the Qwen 3 Family

artesia · 29 April 2025 11:00

The video introduces the Qwen 3 family of open, versatile AI models ranging from 6 billion to 235 billion parameters, featuring innovative reasoning modes, multi-language support, and enhanced tool use, all released under the Apache 2.0 license. It highlights their comprehensive training, diverse configurations, and potential for customization, with future larger models and capabilities expected to expand their impact in AI research and applications.

artesia · 29 April 2025 11:20

The video introduces the Qwen 3 family of models, highlighting the extensive and diverse range of models released by the Quen team. Unlike typical releases that focus on a few models for specific purposes, Quen has launched a large family simultaneously, including models from 6 billion to 235 billion parameters, with some featuring mixture of experts and others being dense models. Notably, they have released multiple variants, including quantized weights, base models, and specialized configurations, providing a broad toolkit for different use cases. This comprehensive release approach is unusual and demonstrates their ambition to cover a wide spectrum of AI applications in one go.

The speaker emphasizes the open nature of these models, which are licensed under Apache 2.0, allowing users to freely utilize and fine-tune them. While some of the largest models, such as the 22 billion active parameters and the 32 billion base model, are missing from the initial release, the speaker speculates that these may be released later. The models are compared to other prominent models like DeepSeek R1 03 Mini and Gemini 2.5 Pro, with Quen’s models showing competitive performance, especially considering their open availability. This openness contrasts with proprietary models, making them particularly attractive for research and customization.

A key innovation in the Qwen 3 family is the addition of reasoning or thinking modes, enabling hybrid models that can adjust their reasoning depth based on user preferences. This feature allows the models to perform extended chains of thought, improving their reasoning capabilities. The models support 119 languages and dialects, including many Asian and less commonly represented languages, making them accessible and useful for a global audience. Additionally, tool use has been enhanced, allowing the models to call external tools for tasks like plotting, file editing, and computer operations, further expanding their practical utility.

Training and pre-training strategies are also discussed, with the models utilizing approximately 36 trillion tokens—double the amount used in previous versions. The training process involves multiple stages, including general web data, knowledge-intensive data, and synthetic data for math and coding tasks. The models are trained with extended context lengths, up to 32K tokens, to handle longer inputs. The post-training process involves several stages, including chain-of-thought training, reinforcement learning with verified rewards, and alignment techniques, although specific details and the number of training examples remain undisclosed.

Finally, the speaker demonstrates how users can interact with the models via chat.quen.ai, experimenting with different reasoning settings and tool integrations. They showcase how enabling or disabling thinking modes affects the model’s responses, noting that smaller models still produce meaningful reasoning tokens. The video concludes with anticipation of future releases, including larger models like the 70B version, and promises further coverage of the models’ agentic capabilities. The overall tone is optimistic about the potential of the Qwen 3 family to advance open AI models and their practical applications.