The best algorithms for reducing batch variability effects of single-cell RNA sequencing data find a balance between the ability to integrate data from different batches and the amount of resources they require to run.

© Shutterstock

No two cells are alike

18 Jan 2021

A comprehensive ranking of 14 algorithms may help bioinformaticians overcome inconsistencies during data collection.

Parents of identical twins could not emphasize how different their children are despite their striking resemblance to each other. Now extrapolate these visual similarities right down to the level of the single cell. What if, despite their uncanny similarities, each cell is vastly different from its neighbor?

Thanks to incredible advances in the field of genomics, it is now possible for scientists to collect gene-expression data of individual cells. In single-cell RNA sequencing, however, data is often gathered from multiple experiments conducted by different personnel, and using different methods, reagents, equipment and platforms.

All of these minute experimental differences add up and can lead to large variations—or batch effects—in the data. As such, correcting for batch effects helps align different datasets and preserve key biological variations.

“If not corrected, batch effects can introduce false signals while masking the underlying biological differences that we are interested in,” explained Jinmiao Chen, a Principal Investigator at A*STAR’s Singapore Immunology Network (SIgN). Chen was the corresponding author on a study that compared 14 state-of-the-art algorithms to determine the most suitable method for correcting batch-specific variations.

The algorithms were tested on ten biological datasets, covering diverse cell types such as dendritic cells, pancreatic cells, retinal cells and peripheral blood mononuclear cells, with datasets from both human and mouse samples. The datasets were collected using a range of RNA-sequencing technologies, namely 10x, SMART-seq, Drop-seq and SMARTer.

Based on five evaluation scenarios—ranging from identical cell types with different technologies, to non-identical cell types, multiple batches, big data and simulated data—the researchers found no superior algorithm among the 14 tested, as each had its strengths and weaknesses.

That being said, Harmony, LIGER, and Seurat 3 were the top three recommendations for batch integration based on rank-sum scores of performance across ten datasets. All three methods were able to complete runs on the large datasets, making them valuable as datasets grow in size and complexity.

Due to its significantly shorter runtime, Harmony was recommended as the first method to try when dealing with large datasets. Conversely, ComBat, MMD-ResNet and limma were ranked the worst-performing methods overall.

“With the continued advancements in single-cell technologies, it will be necessary to identify more efficient and effective methods capable of scaling up in terms of the number of cells and batches,” Chen said.

The hallmarks of an excellent algorithm, Chen noted, is one that achieves a fine balance between superior batch integration and being able to operate within the constraints of computational resources available.

The A*STAR-affiliated researchers contributing to this research are from the Singapore Immunology Network (SIgN),

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!


Tran, H.T.N., Ang, K.S., Chevrier, M., Zhang, X., Lee, N.Y.S., et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biology 21(1):12 (2020) | article

About the Researcher

Jinmiao Chen

Principal Investigator

Singapore Immunology Network
Jinmiao Chen obtained her bachelor’s degree in computer science from Sun Yat-sen University, China in 2002, before completing a PhD degree in machine learning and artificial intelligence at Nanyang Technological University, Singapore in 2007. Chen then joined the bioinformatics core of the A*STAR Singapore Immunology Network as a postdoctoral research fellow, where she analyzed microarrays, next generation sequencing, microbiome/metagenomics, high dimensional flow/mass cytometry and single-cell RNA-sequencing data. In 2014, she established her own research lab at SIgN as a Project Leader; she is now a Principal Investigator at SIgN focusing on single-cell computational/system immunology.

This article was made for A*STAR Research by Wildtype Media Group