LLMs Are Databases - So Query Them

The video reveals that large language models’ feedforward networks function as graph databases, where entities and relations correspond to nodes and edges, enabling direct querying and editing of knowledge using a specialized language called Larql without retraining. This paradigm shift allows dynamic, training-free updates to LLMs, decouples knowledge storage from attention mechanisms, and opens new possibilities for efficient model deployment, local execution, and building models from scratch.

The video presents a groundbreaking perspective that large language models (LLMs), specifically their feedforward networks (FFNs), can be understood and treated as graph databases. By probing the internal weights of models like Google’s Gemma 34B, the presenter demonstrates that entities, features, and relations within the model correspond to nodes, edges, and labels in a graph database. This insight allows for querying the model using a specialized query language called Larql, enabling operations similar to SQL such as selecting, describing, and filtering knowledge stored directly in the model’s weights.

The internal structure of the model is explained as a three-stage process: early layers handle syntax and understanding the query, middle layers store the knowledge as graph edges, and final layers generate the output tokens. Features in the FFN represent edges connecting entities, but due to polysemanticity—where a single feature encodes multiple related concepts—the knowledge is compressed and noisy. Attention mechanisms play a crucial role by routing and weighting these features across the high-dimensional residual stream, allowing the model to disambiguate and produce accurate predictions despite the compressed representation.

Using Larql, the presenter showcases how to query specific relations such as borders or nationality for entities like France, revealing clusters of related countries and concepts. The video also explores how features vary across layers, with the same feature index representing different concepts at different depths. This layered reuse of features highlights the model’s efficiency but also its complexity. The presenter further demonstrates querying entities like Einstein, showing how the model associates him with physics, awards, and related scientific terms, reinforcing the graph database analogy.

A particularly novel aspect is the ability to insert new knowledge directly into the model without retraining. By adding facts such as “Poseidon is the capital of Atlantis,” the presenter shows how Larql can synthesize new gate and down vectors to embed this information into free feature slots. This patch overlay can be compiled into the model weights, making the new knowledge permanent and queryable in subsequent sessions. This capability opens the door to dynamic, training-free updates to LLMs, fundamentally changing how models can be maintained and expanded.

Finally, the video discusses the broader implications of this approach. Decoupling attention (the routing mechanism) from the knowledge store (the FFN graph) suggests that the knowledge base could reside remotely, enabling more efficient model deployment and scaling. The presenter hints at future possibilities, including running large models locally on modest hardware and building models from scratch without traditional training. Overall, the video convincingly argues that LLMs are not just statistical models but physically embodied graph databases that can be queried, edited, and extended programmatically.