Human beings play host to over 100 trillion bacteria and viruses that play a critical role in everything from digestion to immune protection. Besides these microbial friends, there are also the foes: pathogenic microbes that cause infection and disease. Maintaining this delicate human microbiome is a central focus in creating the next wave of precision therapeutics.
Previously, the drug development process relied heavily on time-consuming, expensive and labor-intensive screening methods. In recent years, however, the study of microbe-drug associations has gone digital, thanks to the advent of advanced machine learning and deep learning. By complementing traditional techniques with advanced machine learning, data scientists can rapidly model how microorganisms will respond to clinical interventions.
Xiaoli Li, a machine learning expert from A*STAR’s Institute for Infocomm Research (I2R), is among the team that has created a novel technique capable of predicting the clinical efficacy of newly developed and repurposed drugs with unprecedented accuracy. They’ve named it GCNMDA, short for Graph Convolutional Network-based framework for predicting human Microbe-Drug Associations.
One of the challenges with existing computational frameworks is that they struggle to make sense of complex, multidimensional datasets. Microbial and drug databases, for instance, have intricate layers of relationships, redundancies and associations that are difficult to ‘teach’ machine learning networks. Integrating multiple biological data sources into a single heterogeneous network is another hurdle.
GCNMDA has been the first to successfully overcome these limitations, thanks to a powerful secret weapon—Graph Convolutional Network with an embedded conditional random field (CRF) layer. “The Graph Convolutional Network can learn accurate microbe and drug representations, while CRF is a probabilistic graphical model which possesses powerful capabilities for modeling pairwise relationships between nodes, such as microbe-drug associations,” explained Li, the study’s co-corresponding author.
This addition helped the technique to independently recognize semantic information such as similarities between groups of microbes and drugs, while simultaneously making accurate guesses as to microbe-drug associations. The GCNMDA’s predictions were so accurate that they significantly outperformed seven state-of-the-art computational systems.
In one case study, the team ran data on the SARS-CoV-2 virus and a suite of potential COVID-19 antivirals on GCNMDA, generating a list of the top 40 pharmaceuticals likely to be effective against the disease, including some drugs previously verified to be successful in clinical studies. In another case study, GCNMDA accurately identified potential microbe-drug associations for two antibiotic drugs, ciprofloxacin and moxifloxacin.
In the future, this technology could radically transform how researchers develop countermeasures against global health threats. “We can use GCNMDA as a screening tool to narrow down the search space for candidate compounds, which can be developed as vaccines and drugs against drug-resistant microbes,” Li said.
To enrich the predictive capabilities of the system, the team is feeding GCNMDA larger training datasets encompassing even more biological parameters. They also plan to tap into large volumes of unlabeled data, which could potentially lead to better predictive models.
The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R).