
Software developed at A*STAR has greatly improved laboratory data analysis so that molecules such as lipids (pictured) can be correctly identified in biological samples.
© PASIEKA/Science Photo Library/Getty
Systems biologists rely on the powerful analytical technique of liquid chromatography–mass spectroscopy (LC-MS) to analyze biological molecules but face a challenge in sorting through vast data sets. Now, A*STAR researchers have developed software based on a genetic algorithm that is highly adept at spotting the fingerprints of individual metabolites within the ocean of LC-MS data1,2.
LC-MS studies produce vast amounts of complex information on biological molecules such as metabolites (metabolomics) or lipids (lipidomics), presenting a huge challenge for data analysts. “My team has analyzed ‘big’ omics data and developed mathematical models to improve the quality of various living cells for biotechnological and biomedical applications,” says Dong-Yup Lee from A*STAR’s Bioprocessing Technology Institute and National University of Singapore. “We realized that the available bioinformatics tools for metabolomics and lipidomics analysis were not suitable in terms of their throughput, capabilities and reliabilities.”
Lee’s team recognized the need to integrate several data analysis techniques to reveal how the overall phenotype of an organism might, for example, adapt to environmental changes. This was particularly challenging when trying to uncover the identities and amounts of small molecules such as metabolites which are quite vague.
“Unlike genomic and proteomics data — where the identity of a gene and its products can be unambiguously determined by base sequences — in LC-MS data, the fundamental information on small molecules is not fully captured,” explains Lee. “So we need to find clues that are hidden in the noisy background. Using this imperfect description of a suspect molecule, we compare its features against a known database. If we haven’t seen the molecule before, then clearly we can’t identify it.”
Furthermore, most LC-MS analyses include parameters that are chosen by experts for particular studies and which might not fit another situation. So to select the best parameter sets for LC-MS data processing, the team adapted a common artificial intelligence technique called a genetic algorithm (GA) inspired by natural Darwinian processes that maximize species survival. The parameters act as ‘genes’ in the GA, with various measures of the quality of metabolite identification collectively determining the ‘fitness’ of the overall algorithm.

The team that contributed to this research. Back row (from the left): Yeo Hock Chuan, Ang Kok Siong, Chin Ju Xin, Meiyappan Lakshmanan. Lee Dong-Yup is in the front row, second from the left.
© 2016 A*STAR Bioprocessing Technology Institute
The researchers successfully tested their GA with three metabolomics datasets, including data from cells expressing the antibody Immunoglobin G against the Rhesus D antigen. “We also analyzed a lipidomics dataset with no known working parameters, and the known lipids were identified very quickly,” says Lee. “This progress could shed light on the little-understood role that lipids play in regulating stem-cell differentiation and immunity.”
The A*STAR-affiliated researchers contributing to this research are from the Bioprocessing Technology Institute. For more information about the team’s research, please visit the “-omics” Technologies webpage.