Highlights

Above

A powerful data-visualization algorithm will help biologists find links between parameters in the huge data sets that they generate.

© JESPER KLAUSEN/SCIENCE PHOTO LIBRARY

An algorithm to rule them all

17 Mar 2019

A powerful machine-learning technique enables biologists to analyze enormous data sets

Researchers at A*STAR have compared six data-analysis processes and come up with a clear winner in terms of speed, quality of analysis and reliability. The top performer took large, complex biological data sets and spat out key relations between parameters (such as grouping blood and marrow cells according to cell type) in a fraction of the time of the other techniques.

Measurements on single cells alone can generate huge data sets that have anywhere from 20 to more than 20,000 parameters. The mind-boggling size and complexity of biological data sets make it extremely challenging for scientists to uncover meaningful relationships between parameters.

Mathematicians have developed statistical techniques that simplify complex data sets by grouping data according to their similar characteristics. The most well-known technique is principal component analysis (PCA), which was developed in the early twentieth century. Recently, more powerful techniques, that harness the power of machine learning, have been developed.

Now, Evan Newell and Florent Ginhoux at the Singapore Immunology Network (SIgN), and their colleagues have used single-cell data to test six such machine-learning techniques and discovered one that stands out from the rest in terms of speed, quality of analysis and reliability. This technique is called the uniform manifold approximation and projection, or ‘UMAP’.

“When Evan and Etienne Becht in his group at SIgN started to benchmark UMAP, we realized that it was much more powerful than anything we had used before,” recalls Ginhoux.

An analysis that might take days using other methods can be done in a few hours using UMAP, which will allow scientists to investigate larger data sets. “With UMAP, we can analyze data for two or three million cells, whereas we generally avoid going beyond 100,000 cells with other methods,” says Newell.

UMAP grouped similar cells in the most intuitive way, making it easier to interpret its results.

“I think it’s really groundbreaking,” says Ginhoux. “Researchers I meet at conferences are already starting to use it.”

In an earlier study, the group demonstrated UMAP’s power by using it to discover a new population of cells in blood. Newell notes that UMAP is highly versatile and can be applied to data generated in fields as diverse as astronomy and crystallography. “Basically, any data that can be expressed in matrices can be analyzed by UMAP,” he says.

In addition to using UMAP to analyze data on a daily basis, the team plans to continue to work with informaticians to tailor UMAP to their needs.

The A*STAR-affiliated researchers contributing to this research are from the Singapore Immunology Network.

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!

References

Becht, E., McInnes, L., Healy, J., Dutertre, C.-A. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology 37, 38−44 (2019). | article

About the Researcher

View articles

Florent Ginhoux

Florent Ginhoux obtained his PhD in 2004 from the University Pierre et Marie Curie, Paris VI. As a postdoctoral fellow, he joined the Laboratory of Miriam Merad in the Mount Sinai School of Medicine (MSSM), New York, where he studied the ontogeny and the homeostasis of cutaneous dendritic cell populations, with a strong focus on Langerhans cells and Microglia. In 2008, he became an Assistant Professor in the Department of Gene and Cell Medicine, MSSM and member of the Immunology Institute of MSSM. He joined A*STAR's Singapore Immunology Network (SIgN) in May 2009 as a Principal Investigator before becoming Senior Principal Investigator in 2017. He has been a Web of Science Highly Cited Researcher since 2016, and an EMBO member since 2022. Ginhoux is also an Adjunct Visiting Associate Professor in the Shanghai Immunology Institute, Jiao Tong University, as well as Adjunct Associate Professor in the Translational Immunology Institute, SingHealth and Duke NUS. He is now a laboratory director at the Gustave Roussy Hospital, Villejuif, France. His new laboratory focuses on paediatric cancers.

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature