Flip through old textbooks and you’ll likely see highlighted sections and notes scribbled in the margins. Similarly, N6-methyladenosine (m6A) modifications are chemical tags added to certain parts of RNA molecules that act as ‘notes’, influencing how RNA is read, processed and used by the cell.
These m6A RNA modifications are of particular interest to cancer researchers—they have been linked with aberrant gene expression that promotes the growth and survival of cancer cells and the ability of cancer cells to evade immune detection. Understanding the nature and frequency of m6A RNA modifications can lead to the development of novel cancer treatments.
However, conventional analytical approaches only provide a low-resolution view of a cell’s RNA modifications. “Not all RNAs are equally modified,” explained Jonathan Göke, a Principal Investigator at A*STAR’s Genome Institute of Singapore (GIS). “A gene can generate some RNAs that are modified, and others that are not.”
Göke elaborated that accurately quantifying m6A modification rates at a particular RNA site can help scientists identify important positions that may, for example, be critical to cancer development. “If an RNA molecule is 100 percent modified, it’s probably very important,” Göke said.
In partnership with researchers from the National University of Singapore and Shenzhen Bay Laboratory, China, Göke’s team developed a computational method that uses machine learning (ML) to detect RNA modifications using direct RNA sequencing data.
The neural network-based model called m6Anet was designed to take RNA sequencing read-outs and transform them into a high-dimensional representations—a mathematical way of describing multi-dimensional data points. These representations were then used to predict the probability of each sequencing datapoint containing an m6 modification.
Göke said that the team was following an approach that did not summarise too many sequencing data points into a single point during the training of m6Anet, as this is a known pitfall of existing computational methods.
Ultimately, this strategy paid off with m6Anet outperforming existing methods for accurately quantifying RNA modifications on multiple levels. Using a multiple instance learning (MIL) framework, m6Anet can handle missing data (when the modification status for individual RNA molecules is unknown). In addition, m6Anet can be reliably applied to different cell types and even plant cells without needing to retrain the model’s parameters.
Göke’s team made m6Anet open source, implemented in Python and available for use on GitHub for other researchers and scientists. They also plan on rolling out updated versions of m6Anet at a regular cadence to ensure it remains compatible with the latest sequencing technology.
The A*STAR-affiliated researchers contributing to this research are from the Genome Institute of Singapore (GIS).