The Miniax M3, an open weights AI model featuring advanced multimodal capabilities and a massive context window, demonstrates strong performance in image understanding and complex reasoning tasks but shows room for improvement in coding tasks. While the model runs efficiently and offers promising research utility, the full technical details and weights are yet to be released for comprehensive evaluation.
The video provides a first look at the newly released Miniax M3 model, an open weights AI model from the Chinese company Miniax, known for its popular open weights models in the local language model community. Although the weights and detailed technical report have not yet been released, the model is available for testing. Miniax claims the M3 combines three frontier capabilities, including coding and agentic functions, and features Miniax sparse attention technology that supports a context window of up to one million tokens. The model is also natively multimodal from the start, which is an exciting feature to explore.
The presenter begins testing the M3 with a challenging coding task: creating a single HTML file for a One Piece meets Star Wars-themed game with a rotating 3D sphere using three.js and custom GLSL shaders, along with scroll-triggered animations. While the model successfully generates a working 3D sphere and some animation effects, it falls short of fully meeting the prompt requirements, such as the headline animation and polish of the sphere. Compared to other models like Opus 4.7 and Quen 3.7 Max, the M3’s output is decent but not outstanding, requiring multiple prompts to improve the result.
Next, the presenter tests the model’s multimodal capabilities by feeding it a low-resolution PNG image of an old NFT collection spreadsheet. Impressively, the M3 accurately reads the names from the image, identifies the spreadsheet’s content as related to Solana NFTs, and provides insightful financial analysis and context about the NFT collections. This performance surpasses some other models that struggled with the same image, demonstrating the M3’s strong multimodal processing and research abilities.
The final test involves a complex multi-step research and reasoning task, where the model is asked to find recent papers on JEPA models, rank them by importance, summarize the top three, and recommend one suitable for reproduction on an RTX 3060 GPU. The M3 performs very well, retrieving relevant papers, providing accurate summaries, and making a sound recommendation based on GPU requirements and educational value. The reasoning and writing quality are noted as clear and engaging, with a bit of personality, making the output highly usable for research planning.
Overall, the Miniax M3 shows promise, especially in multimodal understanding and complex reasoning tasks, while its coding capabilities still need improvement. The model runs smoothly within the Hermes agent environment, with reasonable speed and cost, though pricing details may change. The presenter looks forward to the full release of weights and technical details for deeper analysis and suggests that future updates could provide more insights into the model’s architecture and performance. Viewers are encouraged to share their experiences and thoughts on the M3.