Highlights

In brief

A*STAR researchers have developed a machine learning method to detect and identify Android malware.

© Pixabay

Detecting malware on the fly

28 Jun 2019

A*STAR researchers have devised a machine learning technique to help antivirus developers stay ahead in the cat-and-mouse game of Android malware detection.

More than two billion active devices run on Google’s Android platform. Unfortunately, the popularity of the operating system has made it the target of malicious software, or malware—around 3.2 million new Android malware samples were identified by the end of the third quarter of 2018, according to German security software company G Data.

Cybersecurity experts have developed defenses against some of these bad actors, including machine learning and artificial intelligence tools that can recognize suspicious applications. However, most existing methods require expert analysis to predetermine specific malware features—an approach only possible for known cybersecurity threats.

“As malware keeps on evolving, any predetermined features will soon become outdated, but manually defining new features takes time and is not easy. Also, updating the batch learning-based classifiers requires retraining the malware detection model with new malware samples and all previous training samples, which is slow and resource intensive,” said Dr. Li Zhang, former Research Scientist at A*STAR’s Institute for Infocomm Research (I2R), who is now with ST Engineering.

To overcome these limitations, Li and colleagues combined two techniques—n-gram analysis and online classifiers—to create a machine learning model for more efficient discovery of Android malware. The method uses part of an application’s code to generate n-grams, the equivalent of a fingerprint containing detailed information about the application.

A classifier algorithm then automatically assigns a weight, or score, to the component parts of the fingerprint (sub-fingerprints) according to how closely each sub-fingerprint resembles malware. “A dedicated classifier is used to handle a specific category of information in the Android application. Such a design helps further improve the classification accuracy and reduce the model training time,” Zhang explained. Importantly, their model can adapt itself based on new training samples without forgetting knowledge obtained from prior datasets—what is known as incremental learning.

Applying their approach to a benchmark dataset of more than 10,000 application samples, the researchers achieved a malware detection accuracy of 99.2 percent. Tested on a real-world dataset containing more than 70,000 samples, the model performed with 86.2 percent accuracy. Furthermore, when classifying malware, the technique obtained an accuracy of 98.8 percent on the top 23 malware families of the Debrin dataset, a well-annotated library of Android malware.

“Our framework can help security analysts or antivirus developers better cope with fast-evolving malware. Besides, the underlying model is linear and lightweight, which can even be deployed on phones to achieve real-time protection of Android users,” said Zhang.

His team is now expanding the framework by also considering the runtime behaviors of Android applications, which will further improve malware classification accuracy.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Zhang, L., Thing, V. L. L., Cheng, Y. A scalable and extensible framework for android malware detection and family attribution. Computers & Security (80), 120-133 (2019) | article

About the Researcher

View articles

Li Zhang

Former A*STAR Research Scientist

Institute for Infocomm Research
Li Zhang obtained his PhD degree in 2015 from Nanyang Technological University, Singapore, where he investigated hardware IP protection and hardware Trojan detection. Thereafter, he worked at UL Transaction Security, gaining insights on fault injection and side-channel analyses of embedded systems before joining the Institute for Infocomm Research (I2R), A*STAR, in 2017 as a Research Scientist. He is currently a Principal Engineer at ST Engineering and his research interests revolve around digital forensics, mobile security and hardware security.

This article was made for A*STAR Research by Wildtype Media Group