Nvidia’s GEAR initiative aims to develop versatile embodied agents capable of operating in both virtual and real-world environments, with a key breakthrough being the “Hover” neural whole-body controller that unifies control systems for various tasks. This innovative system allows robots to perform complex actions fluidly, leveraging advanced AI and simulation technology to accelerate training and enable zero-shot transfer of skills from simulation to real-world applications.
Nvidia has made significant strides in robotics research, showcasing their capabilities beyond GPU manufacturing. Their recent initiative, known as GEAR (Generalist Embodied Agent Research), aims to develop versatile agents that can operate in both virtual and real-world environments. Led by Dr. Jim Fan and Professor Yuk Zuo, the GEAR team focuses on creating foundational models for embodied agents, which include multimodal models for planning, reasoning, and manipulation. Their research encompasses a wide range of applications, from general-purpose robotic systems to large action models that can autonomously explore various simulations.
A key breakthrough from Nvidia’s research is the introduction of “Hover,” a versatile neural whole-body controller designed for humanoid robots. This innovative system addresses a major challenge in robotics: the need for different control systems for each specific task. Traditionally, robots require separate programming for walking, object manipulation, and maintaining balance, which is inefficient and limits their adaptability. Hover aims to unify these control systems, allowing robots to perform multiple tasks fluidly, similar to how humans intuitively switch between different activities.
Hover operates by mimicking human intuition, utilizing advanced AI and motion capture technology to learn from human movements. Instead of requiring distinct control systems for each action, Hover functions as a single model that coordinates various movements simultaneously while maintaining balance and precision. This capability enables robots to perform complex tasks, such as walking while carrying objects or engaging in conversations, without the need for constant reprogramming.
The research team has achieved remarkable results with a relatively small neural network of 1.5 million parameters, which has enabled them to accelerate training processes significantly. Using Nvidia’s powerful simulation technology, they can condense a year’s worth of real-world training into just 50 minutes of simulated time. This simulation, known as ISAC, allows robots to practice movements at an astonishing speed, effectively acting as a time machine for training, where they can learn from mistakes and refine their skills rapidly.
One of the most impressive aspects of Hover is its ability to transfer learned skills from simulation to the real world without any fine-tuning. This zero-shot transfer capability means that robots can operate effectively in physical environments immediately after training in simulation. Furthermore, Hover has been shown to outperform specialized systems designed for specific tasks, leveraging shared knowledge across different movements. This breakthrough suggests that generalist systems can learn fundamental principles applicable to various actions, enhancing the overall efficiency and adaptability of humanoid robots.