NVIDIA's New Free AI - A Gift To All of Us

NVIDIA’s Neotron 3 Ultra is a groundbreaking, free, and open AI model notable for its blazing speed, extensive openness—including publicly available weights and a permissive Open MDW license—and innovative architecture that efficiently manages its massive 550 billion parameters. While it struggles with complex coding tasks, it excels in practical everyday applications, marking a significant milestone in open AI development and encouraging widespread experimentation through accessible cloud platforms.

NVIDIA has released Neotron 3 Ultra, their newest free and open AI model, which has impressed the reviewer with its blazing speed but also caused some confusion and disappointment, especially in coding tasks. While the AI struggled with complex programming challenges like writing a light simulation or a real-time strategy game, often producing excessive or incorrect code, it excelled in other practical tasks such as fixing broken installations, organizing files, and quickly generating experiments. Over time, the reviewer found it increasingly useful for everyday tasks, despite its limitations in advanced coding.

One of the standout features of Neotron 3 Ultra is its openness. NVIDIA has made the model weights, research paper, and training data (at least the redistributable parts) publicly available, marking it as possibly the most open AI model to date. The licensing is particularly noteworthy, as it uses the Open MDW license, a variant of Apache 2.0 tailored for machine learning weights. This license allows almost unrestricted use, including commercial applications and derivative works, with the added protection that any legal claim against the model results in losing the license, which the reviewer praises as a significant improvement over NVIDIA’s previous proprietary licenses.

Running Neotron 3 Ultra locally is challenging due to its massive size—550 billion parameters requiring hundreds of gigabytes of GPU memory. This makes cloud platforms like Lambda a practical choice for users wanting to leverage its power. The model supports an exceptionally long context window of one million tokens, which is beneficial for handling large codebases or extensive documents. However, it currently lacks multimodal capabilities, meaning it cannot process images or videos, which the reviewer hopes will be addressed in future versions or through integration with other models like Gemma 4.

Technically, Neotron 3 Ultra employs a mixture of experts architecture, activating only about 10% of its parameters per token, which helps manage its enormous size efficiently. It also uses innovative memory layers called “mambber” layers that compress and retain important conversational context while discarding irrelevant information, improving processing efficiency. Additionally, the model uses low-precision number formats (NVFP4) and parallel token prediction with multiple heads, all contributing to its remarkable speed and performance.

Overall, the reviewer celebrates Neotron 3 Ultra as a major milestone in open AI development, emphasizing the importance of open science and open models in advancing humanity. Despite its imperfections and limitations, the model’s openness, speed, and licensing represent a significant step forward. The reviewer encourages scholars and developers to explore and experiment with this technology, highlighting the availability of powerful cloud resources like Lambda to run such large models. This release is seen as a gift to the AI community and a promising sign for the future of open AI research.