Cells, much like people, exist in a vast, interconnected network, constantly communicating with one another. Through electric pulses and chemical signals, they engage in silent conversations, coordinating essential functions that keep tissues alive and healthy.
Understanding how genes behave in different tissue regions, influenced by their immediate surroundings, is crucial for uncovering disease mechanisms and developing targeted therapies, explained Jinmiao Chen, a Principal Investigator at the A*STAR Bioinformatics Institute (A*STAR BII).
Yet, current methods often lose vital information. For example, in single-cell transcriptomics, when cells are separated for analysis, their spatial organisation within tissues is disrupted. While spatial transcriptomics (ST) techniques can capture the spatial details, they are hindered by high data sparsity and the considerable complexity of integrating gene expression and spatial data.
“Due to limitations in ST acquisition technology, not all genes are captured and measured in each cell, making it harder to accurately identify cell types based on their canonical marker genes,” said Chen.
To tackle these limitations, Chen teamed up with Huazhu Fu, a Principal Scientist from the A*STAR Institute of High Performance Computing (A*STAR IHPC), and researchers from BGI Research-Southwest, BGI-ShenZhen and University of Chinese Academy of Sciences in China. Their goal was to create an artificial intelligence (AI) method that integrates gene expression and spatial data, allowing a clearer view of how cells interact and vary within tissues.
This effort resulted in the creation of a new AI tool—spatially embedded deep representation (SEDR) —which uses deep learning to map gene expression and spatial information into a simplified format. By combining a deep autoencoder and graph autoencoder, SEDR processes large, complex datasets to reveal patterns of cell behaviour across tissues.
“This representation aims to capture meaningful variations in the data while reducing redundancy and discarding noise,” noted Chen.
SEDR was tested on various datasets, including brain, cancer and immune tissues. It excelled in grouping cells, correcting batch effects and filling in missing data. “SEDR effectively imputes and denoises ST data, thereby enhancing the accuracy of marker identification,” said Chen.
The team also applied SEDR to a breast cancer dataset, revealing tumour sub-regions inhabited by different cell types, including immune cells and cancer-associated fibroblasts, which alter the tumour environment. This insight, powered by SEDR, highlights how the tool can lead to breakthroughs in understanding diseases and discovering new drug targets.
Looking ahead, the team is developing a version of SEDR that can analyse sub-cellular ST data, revealing even finer details, like molecule locations within cells. They also plan to extend SEDR to analyse more complex datasets that capture gene expression, protein abundance and epigenetic modifications from the same tissue.
The A*STAR-affiliated researchers contributing to this research are from the A*STAR Bioinformatics Institute (A*STAR BII) and A*STAR Institute of High Performance Computing (A*STAR IHPC).