The Google DeepMind genomics team discusses AlphaGenome, their new unified model that predicts the functional impact of genetic variants by analyzing long DNA sequences at single-base resolution across multiple biological modalities. They highlight the model’s technical innovations, broad applicability to disease research, and plans for future expansion and community engagement.
The video features a roundtable discussion with the Google DeepMind genomics team about their new model, AlphaGenome, recently published in Nature. AlphaGenome is a unified DNA sequence-to-function prediction model designed to predict the functional impact of genetic variants. The team, including Dhavi Hariharan, Ziga Avsec, Natasha Latysheva, Tom Ward, and Jun Cheng, discusses the motivation behind building AlphaGenome, emphasizing the importance of deciphering the genome—the “source code of life”—to advance health and biological understanding, particularly in the context of genetic diseases and rare disorders.
The team explains that previous models in the field were limited in scope, often focusing on specific tasks or requiring trade-offs between sequence length and resolution. AlphaGenome addresses these limitations by integrating multiple modalities into a single model, allowing for long-range DNA sequence analysis at single-base resolution across many output types. This comprehensive approach enables researchers to assess the effects of genetic variants from multiple biological perspectives without needing to use several specialized models.
A significant technical challenge was efficiently processing long DNA sequences at high resolution and across multiple modalities. The team overcame this by dividing sequences into subsequences processed in parallel on multiple TPUs, ensuring communication between them. They also optimized data loading by compressing sparse data and rigorously selecting high-quality, diverse training data. These innovations allowed them to train the model efficiently and extend its capabilities to handle complex tasks like splicing and contact map prediction, which are crucial for understanding gene regulation and expression.
Evaluation of AlphaGenome involved both molecular-level and organism-level assessments, benchmarking its predictions against experimental data and known disease-associated mutations. The team developed a fast, parallelized variant scoring API to handle the large output data and ensure comprehensive, rigorous evaluation. They also structured their evaluation process by assigning team members to focus on specific modalities, enabling thorough benchmarking and comparison with existing models.
Looking ahead, the team is excited to see how the community will use AlphaGenome, particularly for identifying harmful mutations and advancing basic biological research. They plan to expand the model’s capabilities, including support for more species, modalities, and large-scale analyses, as well as releasing model weights for community use. The team values user feedback and intends to leverage advances in single-cell data to further improve the model’s predictive power and applicability to disease research.