A powerful data-visualization algorithm will help biologists find links between parameters in the huge data sets that they generate.


An algorithm to rule them all

17 Mar 2019

A powerful machine-learning technique enables biologists to analyze enormous data sets

Researchers at A*STAR have compared six data-analysis processes and come up with a clear winner in terms of speed, quality of analysis and reliability. The top performer took large, complex biological data sets and spat out key relations between parameters (such as grouping blood and marrow cells according to cell type) in a fraction of the time of the other techniques.

Measurements on single cells alone can generate huge data sets that have anywhere from 20 to more than 20,000 parameters. The mind-boggling size and complexity of biological data sets make it extremely challenging for scientists to uncover meaningful relationships between parameters.

Mathematicians have developed statistical techniques that simplify complex data sets by grouping data according to their similar characteristics. The most well-known technique is principal component analysis (PCA), which was developed in the early twentieth century. Recently, more powerful techniques, that harness the power of machine learning, have been developed.

Now, Evan Newell and Florent Ginhoux at the Singapore Immunology Network (SIgN), and their colleagues have used single-cell data to test six such machine-learning techniques and discovered one that stands out from the rest in terms of speed, quality of analysis and reliability. This technique is called the uniform manifold approximation and projection, or ‘UMAP’.

“When Evan and Etienne Becht in his group at SIgN started to benchmark UMAP, we realized that it was much more powerful than anything we had used before,” recalls Ginhoux.

An analysis that might take days using other methods can be done in a few hours using UMAP, which will allow scientists to investigate larger data sets. “With UMAP, we can analyze data for two or three million cells, whereas we generally avoid going beyond 100,000 cells with other methods,” says Newell.

UMAP grouped similar cells in the most intuitive way, making it easier to interpret its results.

“I think it’s really groundbreaking,” says Ginhoux. “Researchers I meet at conferences are already starting to use it.”

In an earlier study, the group demonstrated UMAP’s power by using it to discover a new population of cells in blood. Newell notes that UMAP is highly versatile and can be applied to data generated in fields as diverse as astronomy and crystallography. “Basically, any data that can be expressed in matrices can be analyzed by UMAP,” he says.

In addition to using UMAP to analyze data on a daily basis, the team plans to continue to work with informaticians to tailor UMAP to their needs.

The A*STAR-affiliated researchers contributing to this research are from the Singapore Immunology Network.

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!


Becht, E., McInnes, L., Healy, J., Dutertre, C.-A. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 37, 38−44 (2019). | article

About the Researcher

Florent Ginhoux

Senior Principal Investigator

Singapore Immunology Network
Florent Ginhoux graduated in Biochemistry from the University Pierre et Marie CURIE (UPMC), Paris VI, obtained a Masters degree in Immunology from the Pasteur Institute in 2000 and his PhD in 2004 from UPMC, Paris VI. As a postdoctoral fellow, he joined the Laboratory of Miriam Merad in the Mount Sinai School of Medicine (MSSM), New York. In 2008, he became an Assistant Professor in the Department of Gene and Cell Medicine, MSSM and member of the Immunology Institute of MSSM. He joined the Singapore Immunology Network (SIgN), A*STAR in May 2009 as a Junior Principal Investigator. He is now a Senior Principal Investigator and an EMBO Young Investigator and his laboratory is focusing on the ontogeny and differentiation of macrophages and dendritic cells in both humans and mice. He was listed as a highly cited researcher on Web of Science in 2016, 2017 and 2018. Dr Ginhoux holds the following adjunct positions: Adjunct Assistant Professor, NUS, Singapore; Adjunct Assistant Professor, Duke-NUS, Singapore; Joint Scientist, KKH, Singapore; Adjunct Senior Principal Investigator, IMB, A*STAR, Singapore and Adjunct Visiting Associate Professor, Shanghai Institute of Immunology at Shanghai Jiao Tong University School of Medicine, China.

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature