Highlights

In brief

The neural network-based method called m6Anet uses multiple instance learning, enabling the accurate identification and quantification of m6A RNA modifications from direct RNA sequencing, outperforming existing methods and showing generalisability across different biological samples.

© Unsplash

Reading hidden notes on genetic molecules

13 Dec 2023

A new computational tool accurately predicts RNA modifications from sequencing data, allowing researchers to study and develop novel therapies for complex diseases such as cancer.

Flip through old textbooks and you’ll likely see highlighted sections and notes scribbled in the margins. Similarly, N6-methyladenosine (m6A) modifications are chemical tags added to certain parts of RNA molecules that act as ‘notes’, influencing how RNA is read, processed and used by the cell.

These m6A RNA modifications are of particular interest to cancer researchers—they have been linked with aberrant gene expression that promotes the growth and survival of cancer cells and the ability of cancer cells to evade immune detection. Understanding the nature and frequency of m6A RNA modifications can lead to the development of novel cancer treatments.

However, conventional analytical approaches only provide a low-resolution view of a cell’s RNA modifications. “Not all RNAs are equally modified,” explained Jonathan Göke, a Principal Investigator at A*STAR’s Genome Institute of Singapore (GIS). “A gene can generate some RNAs that are modified, and others that are not.”

Göke elaborated that accurately quantifying m6A modification rates at a particular RNA site can help scientists identify important positions that may, for example, be critical to cancer development. “If an RNA molecule is 100 percent modified, it’s probably very important,” Göke said.

In partnership with researchers from the National University of Singapore and Shenzhen Bay Laboratory, China, Göke’s team developed a computational method that uses machine learning (ML) to detect RNA modifications using direct RNA sequencing data.

The neural network-based model called m6Anet was designed to take RNA sequencing read-outs and transform them into a high-dimensional representations—a mathematical way of describing multi-dimensional data points. These representations were then used to predict the probability of each sequencing datapoint containing an m6 modification.

Göke said that the team was following an approach that did not summarise too many sequencing data points into a single point during the training of m6Anet, as this is a known pitfall of existing computational methods.

Ultimately, this strategy paid off with m6Anet outperforming existing methods for accurately quantifying RNA modifications on multiple levels. Using a multiple instance learning (MIL) framework, m6Anet can handle missing data (when the modification status for individual RNA molecules is unknown). In addition, m6Anet can be reliably applied to different cell types and even plant cells without needing to retrain the model’s parameters.

Göke’s team made m6Anet open source, implemented in Python and available for use on GitHub for other researchers and scientists. They also plan on rolling out updated versions of m6Anet at a regular cadence to ensure it remains compatible with the latest sequencing technology.

The A*STAR-affiliated researchers contributing to this research are from the Genome Institute of Singapore (GIS).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Hendra, C., Pratanwanich, P.N., Wan, Y.K., Goh, W.S.S., Thiery, A., et al. Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nature Methods 19, 1590-1598 (2022). | article

About the Researcher

View articles

Jonathan Göke

Principal Investigator

Genome Institute of Singapore (GIS)
Jonathan Göke received his PhD degree in computational biology from the Max Planck Institute for Molecular Genetics in Berlin, Germany. He is currently a Principal Investigator at the Genome Institute of Singapore (GIS), A*STAR, leading research on computational transcriptomics, with a particular interest in genomics technology and the translational aspects of cancer.

This article was made for A*STAR Research by Wildtype Media Group