Highlights

In brief

A deep dive by researchers reveal the practical pitfalls of using AI/ML to decipher the behaviour of biological networks and proposes a hybrid approach as a possible solution.

© Unsplash

Machine learning tackles biological data

16 Mar 2023

Despite the hype, State-of-the-art machine learning challenges one of the toughest problems in systems biology.

Artificial intelligence (AI) has enabled researchers to address problems that would otherwise remain intractable through traditional experimental and analytical approaches. Machine learning (a class of AI designed to build models, offer predictions, or make decisions) is particularly suited for handling large experimental datasets.

For example, AI/ML has the potential to connect the dots within and across various layers of networks, such as genetics, biochemical reactions, and clinical observations in a holistic research approach called systems biology. Besides, it is believed to be capable of elucidating complex behaviours that arise from such network interactions.

Despite the considerable hype in the field, however, scientists are still figuring out whether today’s AI/ML platforms are ready to shake the foundations of systems biology.

Providing their perspective in a critical analysis of the field, Kumar Selvarajoo and Hock Chuan Yeo, computational biologists from A*STAR’s Bioinformatics Institute (BII), argued that AI/ML may be still too green to accurately make sense of nature’s complexity.

They explain that for AI/ML to be effective, it requires large volumes of high-quality data from thoughtfully designed experimental setups. This can be difficult to implement especially for modelling new systems, as the ‘right’ amount of training data to yield accurate predictions can be hard to define.

Even after acquiring big datasets, AI/ML would struggle to make sense of systems that rely on chance events such as gene expression in single cells, or ultrasensitive (chaotic) systems. “Some known examples are energy metabolism, certain synthetic biochemical networks, and cell fate decisions or transitions, where their ultrasensitive behaviours will be difficult for AI to model, due to challenges associated with collecting high quality experimental data,” said Selvarajoo.

To further complicate the problem, AI/ML models are prone to a phenomenon known as data leakage, which occurs when irrelevant information ends up being mistakenly correlated with the fundamentals of the system being studied, thereby generating artifacts that impact accuracy.

The researchers propose a hybrid approach to help overcome existing AI/ML pitfalls. According to this method, biologists leverage mechanistic information derived from experimental research to inform and update AI/ML models, whenever it is practical to do so.

“[Hybrid models] require less data to achieve the same predictive power by providing meaningful mechanistic constraints in lieu of more data,” said Yeo, adding that this approach can also mitigate the effects of unavoidable biases during biological data generation.

Selvarajoo and Yeo are currently building two complementary in silico AI platforms with applications in biotechnology. The first, a hybrid AI/ML model, optimises synthetic pathways in host cells to produce novel or valuable biomolecules. “The second AI platform will cater to complex systems that cannot be easily understood or modelled,” Selvarajoo said, giving the effect of temperature, pH, and aeration on biomolecules production as examples.

The research group plans to team up with biomanufacturing researchers in the near future to apply and commercialise their platforms for industrial applications.

The A*STAR-affiliated researchers contributing to this research are from the Bioinformatics Institute (BII) and the Singapore Institute of Food and Biotechnology Innovation (SIFBI).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Yeo, H.C. and Selvarajoo, K. Machine learning alternative to systems biology should not solely depend on data. Briefings in Bioinformatics 23(6):bbac436 (2022) | article

About the Researchers

View articles

Kumar Selvarajoo

Senior Principal Investigator

Bioinformatics Institute (BII)
Kumar Selvarajoo is a Senior Principal Investigator with the Computational Biology & Omics laboratories at A*STAR’s Bioinformatics Institute (BII) and the Singapore Institute for Food & Biotechnology Innovation (SIFBI), and serves as a National Science Scholarship (BS-PhD) mentor there. He is also an Adjunct Associate Professor at the Yong Loo Lin School of Medicine, National University of Singapore, and the School of Biological Sciences, Nanyang Technological University. Prior, he was an Associate Professor in Systems Biology at the Institute for Advanced Biosciences, Keio University, Japan. He serves on the editorial board of Genomics, Scientific Reports, and Biotechnology Notes and has lead research teams in computational biology, systems biology, bioinformatics, data analytics and statistical genetics. In particular, Selvarajoo has used original ideas, utilising fundamental statistical laws, to investigate multi-dimensional datasets, deterministic and stochastic modelling of complex signaling and metabolic networks. He has authored over 75 scientific articles, presented as an invited speaker at international conferences, obtained several research grants, and reviewed international grants. In 2013, 2015 and 2018, he founded and chaired the Symposium on Complex Biodynamics and Networks (cBio).
View articles

Hock Chuan Yeo

Senior Research Fellow

Bioinformatics Institute (BII)
Hock Chuan Yeo received a MSc in Bioinformatics on a A*STAR scholarship and obtained a PhD degree specialising in Computational Systems Biotechnology at the National University of Singapore (NUS). Before joining BII as a Senior Research Fellow, he led efforts in three other research institutes developing innovative algorithms and frameworks for elucidating actionable insights into biological and bioprocessing phenomena. He leverages high-throughput multi-omics data analyses as well as the mathematical and AI modelling of biological systems in his work. He is a biologist at heart and sees bioinformatics and modelling as an extension of his experimental toolkit.

This article was made for A*STAR Research by Wildtype Media Group