NVIDIA has launched Nemotron 3 Super, a powerful and transparent AI assistant with 120 billion parameters, trained on 25 trillion tokens, which is free for everyone to use indefinitely and accompanied by a detailed 51-page research paper. Featuring innovations like multi-token prediction, stochastic rounding for speed, and memory-efficient “mamba layers,” it offers fast and efficient performance while challenging the dominance of proprietary AI models, marking a significant step toward accessible and open AI technology.
NVIDIA has released an extraordinary new AI assistant called Nemotron 3 Super, which is free for everyone to use forever. Unlike most AI systems that are proprietary and require subscriptions, NVIDIA has shared not only the model but also a comprehensive 51-page research paper detailing every step of its creation and the dataset it was trained on. This transparency is unprecedented in the AI field, where details are usually kept secret. The model was trained on an enormous dataset of 25 trillion tokens and has 120 billion parameters, making it roughly as capable as some of the best closed-source models from about a year and a half ago, which cost billions to develop.
One of the standout features of Nemotron 3 Super is its speed. NVIDIA introduced two versions of the model: BF16 and NVFP4. While both perform similarly in terms of accuracy, the NVFP4 version is about 3.5 times faster than BF16 and up to 7 times faster than other similarly smart open models. This speed boost is achieved through innovative techniques such as compressing mathematical operations by rounding numbers in a way that preserves accuracy, a method called stochastic rounding. This approach cleverly balances speed and precision, allowing the AI to run much faster without significant loss in output quality.
Another key innovation is the multi-token prediction capability. Traditional AI models generate responses one token (or word) at a time, but Nemotron 3 Super predicts multiple tokens—specifically seven—at once and verifies them together. This method significantly accelerates the response generation process. Additionally, NVIDIA introduced “mamba layers,” which address the AI’s memory efficiency. Instead of repeatedly reprocessing all information, the system takes compressed notes that retain important details while discarding filler content, enabling it to handle large amounts of data more effectively.
Despite these advancements, there are still some limitations. For example, complex tasks involving extensive calculations, like assembling robotic cows with lots of math, can take the AI nearly an hour to complete. This suggests that while the model is powerful and fast for many applications, extremely demanding workloads may require more specialized or faster hardware. Nonetheless, the release of Nemotron 3 Super marks a significant shift in the AI landscape, challenging the dominance of closed systems and opening the door for more accessible, transparent, and efficient AI technologies.
Overall, NVIDIA’s commitment to open AI systems, backed by substantial investment, signals a new era where powerful AI tools are freely available to the public. This development excites many in the AI community and beyond, as it democratizes access to advanced AI capabilities and fosters innovation. The detailed research paper accompanying the model offers valuable insights for developers and researchers, making Nemotron 3 Super not just a tool but a milestone in AI transparency and performance. The video’s presenter expresses enthusiasm for future updates and encourages viewers to engage with the topic, highlighting the transformative potential of this breakthrough.
You can find the NVIDIA Nemotron 3 Super research paper here: NVIDIA Nemotron 3 Super - Technical Report.