Highlights

In brief

The machine learning platform matched or surpassed current methods for finding and measuring gene activity and was successfully tested on stem cell data, highlighting its ability to map the complex gene landscape with precision.

© Unsplash

Charting activity in genetic cityscapes

5 Jun 2024

Bambu, a new computational tool for analysing genetic data, effectively identifies and quantifies complex gene patterns.

If the human body is a city, then genes are the construction blueprints. However, this 'city' is unique because its landscape is continuously transforming. The dynamic nature of cellular processes means that interpreting these 'blueprints' isn't straightforward—a brand new skyscraper could be mistaken for an old one if we rely solely on outdated or static readings.

Traditional methods for measuring gene activity often miss new or unusual gene patterns because they compare against a fixed set of known genes, which can result in inaccurate data.

To overcome this limitation, Jonathan Göke, Andre Sim and Ying Chen from A*STAR’s Genome Institute of Singapore (GIS) introduced a new context-aware transcript quantification approach called Bambu. This machine learning tool detects both known and previously unidentified gene patterns with high precision.

Sim explained that Bambu uses a two-step approach. Firstly, it uncovers hidden patterns through a transcript discovery phase, and secondly, Bambu applies an expectation maximisation algorithm which, together, help predict how active different genes are.

Think of Bambu as a drone that gives a comprehensive view of the genetic ‘cityscape’, revealing gene activity even in the most complex or overlooked areas.

Bambu was validated with lab-created RNA sequences, known as spike-ins, to assess its identification and measurement accuracy. It was further tested with complex stem cell data to ensure real-world applicability.

Using Bambu, the team discovered that a small group of repetitive genetic elements (HERVH-LTR7 retrotransposons) were responsible for most of the activity in human embryonic stem cells. Contrary to previous reports, they demonstrated that only a few genetic sequences within a larger family are active, which may be crucial for the stem cells' functions and characteristics.

“When compared to existing quantification-only methods, we found that Bambu had comparable or better performance,” said Chen. Sim added that Bambu had also been the subject of numerous independent evaluations by other research groups.

“Bambu consistently emerged as the top discovery tool across multiple metrics, further validating its effectiveness and superiority,” Sim remarked.

Sim and Chen believe Bambu's advanced transcript discovery can identify unique gene patterns in individuals, even in complex genetic regions or from extensively altered genes. This tool also enhances the visibility of gene activity, potentially highlighting irregularities or indicators of disease.

The team acknowledges the contribution of data from the Singapore Nanopore Expression Project (SG-NEx), an initiative of GIS, to their project's success.

The A*STAR-affiliated researchers contributing to this research are from the Genome Institute of Singapore (GIS).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Chen, Y., Sim, A., Wan, Y.K., Yeo, K., Lee, J.J.X., et al. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nature Methods 20, 1187–1195 (2023). | article

About the Researchers

View articles

Jonathan Göke

Principal Investigator

Genome Institute of Singapore (GIS)
Jonathan Göke received his PhD degree in computational biology from the Max Planck Institute for Molecular Genetics in Berlin, Germany. He is currently a Principal Investigator at the Genome Institute of Singapore (GIS), A*STAR, leading research on computational transcriptomics, with a particular interest in genomics technology and the translational aspects of cancer.
Andre Sim is a Senior Scientist at A*STAR’s Genome Institute of Singapore (GIS). He received his bachelor’s and honours degrees from Massey University, New Zealand and then his PhD degree from Philipps-Universität Marburg, Germany, as part of the Max Planck Graduate School. His research is focused on harnessing computational transcriptomics and long-reads to better understand the transcriptomes of organisms.
Ying Chen received her PhD in Public Health from the Saw Swee Hock School of Public Health at the National University of Singapore. She is a Senior Scientist in the Lab of Computational Transcriptomics at the Genome Institute of Singapore (GIS). Her research interests include biostatistics, data analytics, statistical genomics and cancer research.

This article was made for A*STAR Research by Wildtype Media Group