Highlights

In brief

Carrying data across multiple cell types, SG-NEx illustrates the strength of long-read RNA sequencing technologies to decode full-length transcripts and offers a huge repository for future benchmarking of analytical tools.

Photo by Design_Cells | Shutterstock

A longer look at RNA diversity

11 Dec 2025

The SG-NEx project delivers rich long-read RNA sequencing data that capture subtle differences in gene activity, opening avenues for discovering novel biomarkers in health and disease.

To peer into the inner workings of cells, scientists study gene expression patterns—scouting which parts of the cellular blueprint are switched on based on the amount of ribonucleic acids (RNA) present. These patterns may reveal clues for how cells develop unique identities or flag genes that malfunction in disease.

Simply tallying up RNA levels, however, only scratches the surface. “Looking only at total gene expression is like counting how many books there are in a library without knowing which titles are there,” said Jonathan Göke, a Principal Investigator at the A*STAR Genome Institute of Singapore (A*STAR GIS).  “Many genes can produce various isoforms of RNA that each perform distinct roles in the body.”

Short-read RNA sequencing has long been the standard for decoding such transcript-level differences. The technique’s affordability has enabled the creation of vast repositories of short-read RNA data, fuelling biomedical discoveries and tool development.

“However, this approach cannot easily capture full-length transcripts or resolve complex splicing patterns that give rise to isoforms. Meanwhile, long-read RNA sequencing can cover entire transcripts and reveal more detailed RNA features,” said Göke.

To overcome these challenges, Göke and A*STAR GIS Senior Scientist Ying Chen launched the Singapore Nanopore Expression (SG-NEx) project to generate large-scale, high-quality RNA sequencing datasets. The collaboration united expertise from multiple institutions including A*STAR GIS; the National University of Singapore; the National Cancer Centre Singapore; the Walter and Eliza Hall Institute of Medical Research, the Garvan Institute of Medical Research, and Peter MacCallum Cancer Centre in Australia; the Francis Crick Institute, UK; Seqera Labs, Spain; and University of North Carolina at Chapel Hill, US.

SG-NEx profiled several cell lines and patient samples for a broader representation of human tissues. The team employed multiple sequencing methods—Nanopore long-read direct RNA, amplification-free direct cDNA, PCR-amplified cDNA, PacBio IsoSeq and short-read cDNA sequencing—for systematic comparison. They found that long-read approaches provided greater accuracy in identifying the most abundant isoforms across samples.

“The SG-NEx dataset allows precise measurement of transcript levels, which is essential for identifying biomarkers in neurodegenerative, cardiovascular and infectious diseases,” said Göke. “These insights can support earlier, more accurate diagnoses and inform next-generation treatments.”

To ensure broad scientific benefit, the team released both raw and processed data in an open-access format, complete with computational pipelines. Scientists worldwide can now explore SG-NEx data, assess each platform’s strengths and limitations, and build analytical tools to uncover complex cellular events at the isoform level.

Looking ahead, Göke and Chen aim to develop AI-driven computational pipelines capable of handling the complexity of long-read data and detecting disease-related RNA features. “We are also exploring ways to enhance data accessibility and standardisation across global research institutions, which will support the integration of long-read sequencing into routine clinical and translational research,” said Göke.

The A*STAR-affiliated researchers contributing to this research are from the A*STAR Genome Institute of Singapore (A*STAR GIS).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Chen, Y., Davidson, N.M., Wan, Y.K., Yao, F., Su, Y., et al. A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines. Nature Methods 22, 801–812 (2025). | article

About the Researchers

Jonathan Göke received his PhD degree in computational biology from the Max Planck Institute for Molecular Genetics in Berlin, Germany. He is currently a Principal Investigator at the A*STAR Genome Institute of Singapore (A*STAR GIS), leading research on computational transcriptomics, with a particular interest in genomics technology and the translational aspects of cancer.
Ying Chen received her PhD in Public Health from the Saw Swee Hock School of Public Health at the National University of Singapore. She is a Senior Scientist in the Lab of Computational Transcriptomics at the A*STAR Genome Institute of Singapore (A*STAR GIS). Her research interests include biostatistics, data analytics, statistical genomics and cancer research.

This article was made for A*STAR Research by Wildtype Media Group