Supercomputer AI deciphers the intricate language of biomolecules

Researchers at the University of Glasgow have developed a sophisticated artificial intelligence model that can interpret the complex interactions between proteins, the fundamental building blocks of life. By leveraging a supercomputer typically reserved for astrophysics and particle physics research, the team created a protein language model that significantly outperforms existing technologies in predicting how these essential biomolecules communicate and function together. This breakthrough offers a powerful new lens through which to view the mechanisms of disease and opens new avenues for developing targeted therapies.

The new model, named PLM-Interact, functions like a translation tool for the language of proteins, whose interactions govern nearly every process within a cell. Published in Nature Communications, the research details how this AI can not only map these intricate connections but also predict the consequences of mutations, such as those leading to cancer or other genetic disorders. By providing a deeper understanding of cellular conversations, PLM-Interact has the potential to accelerate research into a wide range of diseases, from viral infections to chronic illnesses, and could aid in predicting the pandemic potential of new viruses by analyzing how they interact with human cells.

A New Grammar for Protein Interactions

At the heart of this development is a cross-disciplinary team of scientists from the University of Glasgow’s School of Cancer Sciences, School of Computing Science, and the MRC-University of Glasgow Centre for Virus Research. Led by Dr. Ke Yuan, Prof. Craig Macdonald, and Prof. David L. Robertson, the group designed PLM-Interact as a large language model (LLM), a type of AI that has demonstrated remarkable capabilities in understanding and generating human language. Instead of processing words and sentences, this model processes the amino acid sequences of proteins as a form of language, learning the grammatical rules that dictate how they form partnerships.

Proteins are the primary structural and functional components of all cells and viruses, and their ability to interact with other proteins is fundamental to life. These interactions are involved in everything from DNA replication to immune responses. However, mapping these connections has traditionally been a slow and laborious process, relying on time-consuming experimental methods. PLM-Interact circumvents these challenges by using computational power to predict these interactions with high accuracy, offering a scalable solution to a complex biological problem. The AI provides a much-needed level of detail on how diseases emerge at the molecular level, offering a more complete picture of the cellular machinery.

Harnessing Astronomical Computing Power

The development of PLM-Interact was made possible by the DiRAC (Distributed Research utilising Advanced Computing) supercomputer, a high-performance computing facility in the U.K. funded to support research in particle physics, astronomy, and cosmology. The research team adapted this powerful tool, originally designed to probe the mysteries of the universe, to explore the inner space of the cell. Dr. Ke Yuan noted the irony in this application, stating that a machine built to understand the universe’s fundamental laws is now helping to decipher the language of life itself.

This immense computational resource was necessary to train the AI model on a vast dataset of known protein interactions. The initial training involved over 421,000 pairs of human proteins, allowing the model to learn the subtle patterns and rules governing their engagement. The supercomputer’s ability to process this massive amount of data was crucial for the model to achieve its high level of predictive accuracy. This repurposing of computational infrastructure highlights the versatility of modern technology and its potential to bridge disparate scientific fields, applying tools from “big science” to solve intricate biological puzzles.

Superior Predictive Performance

Early validation tests have shown that PLM-Interact is remarkably effective, outperforming competing AI models by a significant margin. The Glasgow team’s model demonstrated an accuracy improvement of 16% to 28% over other AI-based protein prediction tools. It successfully predicted key protein interactions that are essential for cellular functions like RNA polymerization and the transport of molecules within the cell. This high degree of accuracy suggests that the model has learned some of the fundamental principles that guide protein behavior.

Notably, the model has even shown superior performance in certain tasks when compared to Google DeepMind’s highly acclaimed AlphaFold3. In one test, PLM-Interact correctly identified protein-to-protein interactions where AlphaFold3 could only predict one of the five known connections. This demonstrates the specialized strength of PLM-Interact in the specific domain of interaction prediction. The model’s ability to not just predict whether an interaction occurs, but how it might be affected by changes, sets it apart as a uniquely powerful tool for biomedical research.

From Cancer to Viral Threats

The practical applications of this technology are vast and could have a significant impact on medical science. One of the model’s most promising capabilities is its ability to accurately identify the impact of mutations on protein interactions. It can distinguish between mutations that have negative consequences, such as those that cause genetic diseases, and mutations that disrupt essential interactions, which can lead to cancers. This could pave the way for more personalized medicine, where treatments are designed based on an individual’s specific genetic makeup and the molecular behavior of their disease.

Beyond cancer and genetic disorders, the research team has also trained PLM-Interact on a dataset of protein interactions between human and viral proteins. This includes 22,383 interactions from 5,882 human proteins and 996 viral proteins. This training enables the model to act as a virus prediction tool, offering insights into how viruses hijack the host’s cellular machinery. Professor David L. Robertson emphasized the value of this capability, particularly in the wake of the COVID-19 pandemic, where understanding virus-host interactions is crucial for developing effective treatments and vaccines. In the future, this approach could be used to assess the pandemic potential of newly discovered animal viruses.

Future Directions and Implications

The development of PLM-Interact marks a significant milestone in the application of AI to fundamental biological questions. The research team, led by Dan Liu, is already working to expand the model’s capabilities and explore its full potential across a wider range of applications. The goal is to create a system that can predict protein interactions with even greater accuracy and at a larger scale, further accelerating the pace of discovery in medical science.

This powerful new tool could dramatically reduce the time and cost associated with drug discovery by identifying the most promising molecular targets for new therapies. By pinpointing the key protein interactions that drive a disease, researchers can more effectively design drugs to disrupt those interactions. As the model is refined and expanded, it will likely become an indispensable tool for biologists and medical researchers, offering unprecedented insights into the intricate language of biomolecules that dictates health and disease.

Leave a Reply

Your email address will not be published. Required fields are marked *