Artificial intelligence (AI) has enabled researchers to address problems that would otherwise remain intractable through traditional experimental and analytical approaches. Machine learning (a class of AI designed to build models, offer predictions, or make decisions) is particularly suited for handling large experimental datasets.
For example, AI/ML has the potential to connect the dots within and across various layers of networks, such as genetics, biochemical reactions, and clinical observations in a holistic research approach called systems biology. Besides, it is believed to be capable of elucidating complex behaviours that arise from such network interactions.
Despite the considerable hype in the field, however, scientists are still figuring out whether today’s AI/ML platforms are ready to shake the foundations of systems biology.
Providing their perspective in a critical analysis of the field, Kumar Selvarajoo and Hock Chuan Yeo, computational biologists from A*STAR’s Bioinformatics Institute (BII), argued that AI/ML may be still too green to accurately make sense of nature’s complexity.
They explain that for AI/ML to be effective, it requires large volumes of high-quality data from thoughtfully designed experimental setups. This can be difficult to implement especially for modelling new systems, as the ‘right’ amount of training data to yield accurate predictions can be hard to define.
Even after acquiring big datasets, AI/ML would struggle to make sense of systems that rely on chance events such as gene expression in single cells, or ultrasensitive (chaotic) systems. “Some known examples are energy metabolism, certain synthetic biochemical networks, and cell fate decisions or transitions, where their ultrasensitive behaviours will be difficult for AI to model, due to challenges associated with collecting high quality experimental data,” said Selvarajoo.
To further complicate the problem, AI/ML models are prone to a phenomenon known as data leakage, which occurs when irrelevant information ends up being mistakenly correlated with the fundamentals of the system being studied, thereby generating artifacts that impact accuracy.
The researchers propose a hybrid approach to help overcome existing AI/ML pitfalls. According to this method, biologists leverage mechanistic information derived from experimental research to inform and update AI/ML models, whenever it is practical to do so.
“[Hybrid models] require less data to achieve the same predictive power by providing meaningful mechanistic constraints in lieu of more data,” said Yeo, adding that this approach can also mitigate the effects of unavoidable biases during biological data generation.
Selvarajoo and Yeo are currently building two complementary in silico AI platforms with applications in biotechnology. The first, a hybrid AI/ML model, optimises synthetic pathways in host cells to produce novel or valuable biomolecules. “The second AI platform will cater to complex systems that cannot be easily understood or modelled,” Selvarajoo said, giving the effect of temperature, pH, and aeration on biomolecules production as examples.
The research group plans to team up with biomanufacturing researchers in the near future to apply and commercialise their platforms for industrial applications.
The A*STAR-affiliated researchers contributing to this research are from the Bioinformatics Institute (BII) and the Singapore Institute of Food and Biotechnology Innovation (SIFBI).