Meta’s New AI Just Leveled Up Virtual Humans

Meta has developed a new AI technique that uses “Gaussian splatting” and advanced lighting models to create highly realistic virtual humans, capturing fine details like hair and lifelike skin effects far better than traditional methods. While the process currently requires a complex and expensive camera setup, future advancements may make this technology accessible to everyday users.

The video discusses a groundbreaking new AI technique from Meta that significantly advances the realism of virtual humans. Traditionally, digital humans in video games and media have looked artificial, with plastic-like skin and unrealistic hair. The new method promises to capture a person’s appearance and render them as a lifelike virtual avatar, accurately simulating complex effects like subsurface scattering—where light penetrates and diffuses through the skin—making the digital representation far more convincing.

A key innovation in this technique is the use of “Gaussian splatting” instead of traditional 3D meshes. Rather than building scenes from flat triangles, the system uses millions of tiny, overlapping 3D ellipses (Gaussians) that can better capture fine, fuzzy details like hair and subtle skin textures. This approach allows for much more realistic rendering of features that are difficult to model with conventional methods, though it comes at the cost of increased memory usage and more challenging editing.

For skin rendering, the technique moves beyond treating skin as a simple painted surface. Real human skin is translucent, with light entering, scattering inside, and exiting at different points. The new method equips each Gaussian with a built-in light sensor, allowing it to determine how much light to emit in every direction, closely mimicking the way real skin interacts with light. This is a significant leap over previous methods, which struggled to simulate such effects efficiently.

To make the lighting calculations practical, the researchers replaced the computationally expensive “spherical harmonics” approach (which required tracking 81 directions per point) with a much faster “zonal harmonics” method, using just three directions per point. This reduces the complexity from cubic to linear, making the process much more efficient. A lightweight convolutional neural network is also used to predict shadows based on body pose, further enhancing realism without excessive computational cost.

However, the video notes a major limitation: capturing the data for these lifelike avatars currently requires a massive, expensive setup with hundreds of cameras and lights—far beyond consumer reach. Despite this, the presenter is optimistic, explaining that research typically starts with expensive, proof-of-concept systems, which are then refined and made more accessible over time. The hope is that, in the near future, such high-quality virtual humans could be created with just a smartphone, making this technology available to everyone.