Text diffusion: A new paradigm for LLMs

The video presents diffusion-based large language models as a novel alternative to traditional autoregressive models, generating entire text drafts simultaneously and refining them iteratively for faster, higher-quality outputs with more flexible prompting. Despite challenges in applying diffusion to discrete language data, these models show promising results, especially in coding tasks, and represent a potential paradigm shift in AI text generation.

The video introduces diffusion-based large language models (LLMs) as a novel paradigm distinct from mainstream autoregressive models like GPT, which generate text one token at a time from left to right. Diffusion models, inspired by the physical process of diffusion (such as ink dispersing in water), generate an entire draft of text at once, starting from gibberish and progressively refining it over multiple steps. This approach allows the model to revise and improve the whole response iteratively, akin to a student revising an essay, rather than committing to each token immediately. While diffusion models are still emerging in language tasks, they have already demonstrated significant success in image and video generation.

One of the key advantages of diffusion LLMs is faster inference speed, which is particularly valuable for latency-sensitive applications like coding. Unlike autoregressive models that generate tokens sequentially, diffusion models can generate many tokens in parallel and require fewer refinement steps, leading to potential speedups of up to 10 times. This speed improvement also translates into cost savings for providers. Additionally, diffusion models promise higher quality outputs by avoiding the left-to-right bias inherent in autoregressive models, which can cause issues like exposure bias and sampling drift. Diffusion models also offer more flexible prompting, allowing the prompt to appear anywhere in the sequence, which is useful for tasks like filling in missing text or refactoring code.

The video then delves into the conceptual and technical challenges of applying diffusion to language, which is inherently discrete and unordered, unlike continuous image data. To address this, diffusion is applied in a latent embedding space where both images and text are represented as continuous vectors capturing semantic information. However, decoding these embeddings back into discrete tokens is non-trivial, leading many state-of-the-art models to diffuse text tokens directly rather than embeddings. This discrete diffusion process is modeled using Markov chains, where tokens are corrupted independently over time, typically by replacing tokens with a mask token, and the model learns to reverse this corruption to generate coherent text.

Training diffusion LLMs involves simulating the forward diffusion process by masking parts of the text and teaching the model to predict the original tokens, similar to masked language modeling approaches like BERT but with multiple noise levels. During inference, the model starts with a fully masked sequence and iteratively predicts and remasks tokens, gradually reducing noise until a complete response emerges. While diffusion models require more training compute compared to autoregressive models, they offer faster inference and competitive performance, especially in coding tasks. Examples include Mercury Coder from Inception Labs and Seed Diffusion from ByteDance, which demonstrate promising speed and quality trade-offs.

In conclusion, text diffusion models represent an exciting new direction for LLMs, bridging ideas from continuous diffusion in images to the discrete nature of language through innovative mathematical frameworks like Markov chains. Although still in early stages, these models show potential for faster, higher-quality text generation and more flexible prompting. The video also hints at ongoing research exploring discrete gradients in language diffusion and invites viewers to express interest in further explorations of this topic. Overall, diffusion-based LLMs could reshape how we think about generating and refining language with AI.