If the human body is a city, then genes are the construction blueprints. However, this 'city' is unique because its landscape is continuously transforming. The dynamic nature of cellular processes means that interpreting these 'blueprints' isn't straightforward—a brand new skyscraper could be mistaken for an old one if we rely solely on outdated or static readings.
Traditional methods for measuring gene activity often miss new or unusual gene patterns because they compare against a fixed set of known genes, which can result in inaccurate data.
To overcome this limitation, Jonathan Göke, Andre Sim and Ying Chen from A*STAR’s Genome Institute of Singapore (GIS) introduced a new context-aware transcript quantification approach called Bambu. This machine learning tool detects both known and previously unidentified gene patterns with high precision.
Sim explained that Bambu uses a two-step approach. Firstly, it uncovers hidden patterns through a transcript discovery phase, and secondly, Bambu applies an expectation maximisation algorithm which, together, help predict how active different genes are.
Think of Bambu as a drone that gives a comprehensive view of the genetic ‘cityscape’, revealing gene activity even in the most complex or overlooked areas.
Bambu was validated with lab-created RNA sequences, known as spike-ins, to assess its identification and measurement accuracy. It was further tested with complex stem cell data to ensure real-world applicability.
Using Bambu, the team discovered that a small group of repetitive genetic elements (HERVH-LTR7 retrotransposons) were responsible for most of the activity in human embryonic stem cells. Contrary to previous reports, they demonstrated that only a few genetic sequences within a larger family are active, which may be crucial for the stem cells' functions and characteristics.
“When compared to existing quantification-only methods, we found that Bambu had comparable or better performance,” said Chen. Sim added that Bambu had also been the subject of numerous independent evaluations by other research groups.
“Bambu consistently emerged as the top discovery tool across multiple metrics, further validating its effectiveness and superiority,” Sim remarked.
Sim and Chen believe Bambu's advanced transcript discovery can identify unique gene patterns in individuals, even in complex genetic regions or from extensively altered genes. This tool also enhances the visibility of gene activity, potentially highlighting irregularities or indicators of disease.
The team acknowledges the contribution of data from the Singapore Nanopore Expression Project (SG-NEx), an initiative of GIS, to their project's success.
The A*STAR-affiliated researchers contributing to this research are from the Genome Institute of Singapore (GIS).