This AI discovers unknown molecules

The video highlights a groundbreaking AI called Dreams that uses self-supervised learning on millions of spectral datasets to discover and understand unknown natural molecules, revealing a vast, unexplored chemical universe. This technology enables the prediction of molecular properties, visualization of molecular relationships, and accelerates the discovery of new compounds with potential applications in medicine, materials, and industry.

The video discusses a groundbreaking AI development that aims to uncover unknown natural molecules, which are fundamental building blocks of life and the world around us. Currently, scientists have identified less than 10% of all existing natural molecules, leaving a vast universe of undiscovered compounds that could revolutionize fields like medicine, materials science, and electronics. The challenge lies in interpreting the data generated by tandem mass spectrometry (LCMS), a technique used to analyze molecules by creating unique spectral fingerprints. While we can produce spectra for virtually any molecule, most of these spectra remain unannotated and difficult to interpret, representing a significant barrier to discovery.

To address this, researchers developed a neural network called Dreams, which employs self-supervised learning to analyze millions of unlabeled spectra. The AI is trained to understand the “language” of molecular spectra by processing a massive dataset of over 200 million spectra from natural materials. Similar to how language models learn grammar and meaning by reading extensive texts, Dreams learns the patterns and properties of molecules based solely on spectral data. This enables the AI to predict the chemical and structural properties of molecules from their spectra, even without knowing their exact structures beforehand.

Once trained, Dreams maps these spectra onto a multi-dimensional “Dreams Atlas,” a comprehensive graph where each molecule is represented as a point based on its similarity to others. This map reveals meaningful relationships between molecules, clustering similar compounds together and highlighting how unknown molecules relate to known ones. Importantly, the atlas shows that many molecules are far from any known compounds, indicating a vast potential for discovering entirely new chemicals with unique properties. This visualization acts as a discovery engine, guiding scientists toward promising areas for further investigation.

The AI’s capabilities are demonstrated through practical examples, such as classifying diverse food items based on their molecular spectra, which accurately grouped plant-based foods, meats, and beverages. It also uncovered potential links between certain chemicals and health conditions, like psoriasis and agricultural fungicides, suggesting new avenues for research. Additionally, Dreams was fine-tuned to predict specific molecular properties, such as the presence of fluorine, a key element in many industrial and pharmaceutical applications. Its high accuracy in these predictions shows how the model can be used to identify molecules with desirable traits, accelerating the search for new drugs, stable compounds, and materials.

Finally, the video emphasizes that this technology is just the beginning. The Dreams model can be further refined to predict full molecular structures from spectra, which would be a major breakthrough in chemistry. The researchers have made the code publicly available, enabling others to use and build upon this work. Overall, the development of Dreams represents a significant step toward unlocking the vast, unexplored chemical universe, with the potential to lead to new medicines, materials, and scientific discoveries that could transform multiple industries.