The physics behind diffusion models

The video explains how diffusion models in machine learning are grounded in the physics of non-equilibrium thermodynamics, using stochastic differential equations to model the forward diffusion of data into noise and learning to reverse this process to generate structured outputs. It highlights the connection between physical diffusion and AI, showing how these models navigate a probability landscape to produce diverse, high-quality samples efficiently through both stochastic and deterministic methods.

The video explores the surprising connection between diffusion models in machine learning and the physics of non-equilibrium thermodynamics, which studies phenomena like ink dispersing in water or smoke spreading in air. Diffusion models are unique in AI because they are grounded in the same mathematical principles that govern physical diffusion processes. The core idea is to view data as a probability landscape, where meaningful images correspond to high-probability peaks and noise corresponds to low-probability valleys. Since the true shape of this landscape is unknown, diffusion models aim to learn a local navigational strategy that guides sampling from noise back to structured data, effectively revealing the landscape gradually over time.

To achieve this, diffusion models use a time-dependent probability distribution, p_t(x), which evolves from structured data to noise in a process called forward diffusion. This forward process simulates particles diffusing away from real data points, spreading out the probability hills to cover more of the space and avoid sparsity issues. The forward diffusion is mathematically modeled using stochastic differential equations (SDEs), incorporating both deterministic drift and stochastic diffusion terms. The drift term pulls particles toward a central point, while the diffusion term adds random jitter, together mimicking Brownian motion and ensuring the training data reflects paths from real images to noise.

Training the model involves learning to reverse this diffusion process. The model acts like a local compass, predicting the gradient of the log probability landscape to guide particles from noise back toward structured data. Unlike simple gradient ascent, which seeks a single optimal solution, the reverse diffusion process is stochastic, allowing for diverse sampling of possible outputs. This reversibility of diffusion was formalized by Brian Anderson, who derived a reverse SDE that depends on the model’s predicted gradients, providing a principled way to generate new data by simulating the reverse motion from noise to structure.

In practice, generating samples requires discretizing the reverse diffusion path into steps, which can be computationally expensive due to the stochastic noise at each step. However, researchers discovered an equivalent deterministic formulation using ordinary differential equations (ODEs) that produce the same probability distributions without the stochastic term. This ODE-based approach enables much faster sampling with fewer steps, and modern diffusion models often combine both stochastic and deterministic methods to balance diversity and efficiency. Tools like Stable Diffusion leverage these advances to generate high-quality images rapidly.

Overall, the video highlights how diffusion models bridge physics and machine learning by using the mathematical framework of diffusion to model time-varying probabilities. While the video focuses on the core diffusion process and its physical intuition, it acknowledges that many practical aspects like model architectures and language conditioning are separate topics. The field is still evolving, with ongoing research refining noise schedules, sampling methods, and extending diffusion concepts to discrete domains like language. This foundational understanding opens the door to further innovations in generative AI grounded in physical principles.