Highlights

In brief

SemiGNN-PPI—a self-ensembling multi-graph neural network—outperforms state-of-the-art deep learning models, effectively addressing data scarcity and domain shift to accurately predict protein-protein interactions.

© Shutterstock

Bioinformatics oracle reads between the lines

8 Nov 2024

A new computational model refines the art of predicting how proteins interact, creating a robust tool to explore complex biological systems even with limited data.

Imagine studying with a textbook that’s missing every other page. Both humans and machines struggle with learning when crucial data is absent. This challenge is magnified when studying biological problems like the molecular dynamics of disease, where vast amounts of data are needed to map protein-protein interactions (PPIs)—the complex ways in which protein molecules affect each other within living systems.

Emerging deep learning (DL)-based computational models are useful for revealing new PPI insights, but often falter due to a scarcity of labelled training data, which is costly to acquire. This problem is exacerbated by domain shift: a phenomenon where models trained on data from one context (e.g. a well-studied set of proteins from a bacterial species) can fail to generalise what they’ve learned to another (e.g. the same set of proteins in a different species).

“The combined impact of label scarcity and domain shift can markedly reduce how generalisable and reliable computational models are in PPI research,” said Ziyuan Zhao, a Senior Research Engineer at A*STAR’s Institute for Infocomm Research (I2R). “This poses significant obstacles to their ability to accurately, consistently predict how complex biological systems work.”

To address this, Zhao worked with Principal Scientist Xulei Yang and I2R colleagues, as well as researchers from A*STAR’s Genome Institute of Singapore (GIS); Nanyang Technological University, Singapore; and Shanghai University, China; to propose a more effective, efficient and generalisable PPI DL model.

Their model, described as a self-ensembling multi-graph neural network for PPI prediction (SemiGNN-PPI) was designed to overcome issues of limited data or unfamiliar PPI contexts by combining graph neural networks (GNNs), which help map complex relationships, with a Mean Teacher model, a technique that learns from both labelled and unlabelled data.

“The self-ensembling strategy uses the collective insights from a set of aggregated predictions—generated from multiple prior evaluations of the GNN—to guide and refine the model's learning trajectory, enhancing its performance in complex biological environments,” said Zhao.

The team also added an element called multi-graph learning to view PPIs from different angles, improving predictions even with imperfect data, while including consistency constraints to ensure the model's accuracy and reliability.

The team found that SemiGNN-PPI outperformed existing benchmark DL-based methods in PPI prediction, especially in scenarios with limited labelled data or with previously unstudied unseen protein datasets. It also showed strong generalisation capabilities, performing well on datasets with different characteristics from those it was trained on.

“Remarkably, the model achieved results on par with fully-supervised models, even when operating with substantially fewer labels, showcasing its efficiency in addressing label scarcity,” said Zhao.

Zhao noted a similar approach can be applied to create more reliable computational models for tackling bioinformatics challenges beyond PPIs. The team plans to refine SemiGNN-PPI further by enhancing its performance on highly imbalanced datasets, as well as explore its use in predicting other types of biological interactions.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R) and Genome Institute of Singapore (GIS).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Zhao, Z., Qian, P., Yang, X., Zeng, Z., Guan, C. et al. SemiGNN-PPI: self-ensembling multi-graph neural network for efficient and generalizable protein-protein interaction prediction. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI ’23) 554, 4984-4992 (2023). | article

About the Researchers

Ziyuan Zhao is a Senior Research Engineer at the Institute for Infocomm Research (I2R), A*STAR. He received an MTech degree at the National University of Singapore (NUS) in 2019 and a BEng degree at Yunnan University, China, in 2017. His research interests are in the fields of deep learning, machine learning, medical imaging, bioinformatics, healthcare and artificial intelligence.
Xulei Yang is a Principal Scientist and Group Leader at the Institute for Infocomm Research (I2R), A*STAR. Previously the research head at YITU Technology Singapore, Yang received his PhD from Nanyang Technological University (NTU) in 2007. With over 16 years of R&D experience in deep learning and machine learning for computer vision and healthcare, he has published more than 100 scientific papers and international patents in the fields of deep learning, 3D vision and medical imaging. He is currently a Senior Member of the Institute of Electrical and Electronics Engineers (IEEE) and a Kaggle Competition Master.

This article was made for A*STAR Research by Wildtype Media Group