Highlights

In brief

By coordinating simultaneous changes to different neural network layers, a new AI model pruning technique helps reduce memory and power requirements without sacrificing performance, supporting high-quality model deployment on resource-constrained edge devices.

Photo by hujur1100 | Freepik

Sharpening shears for AI pruning

10 Apr 2026

Inspired by classical signal processing techniques, a new approach to fine-tuning AI models helps reduce irrelevant computing while maintaining performance.

Like the human brain, many modern artificial intelligence (AI) models often feature multiple information processing layers, each containing millions or even billions of parameters. These tiny internal settings are what enable large language models (LLMs) such as ChatGPT and Claude to produce complex text outputs. However, this scale comes at a cost; massive models need massive amounts of computing memory and power, making them difficult to deploy on everyday hardware such as smartphones, medical devices or industrial sensors.

According to Kaixin Xu, a Senior Research Engineer at the A*STAR Institute for Infocomm Research (A*STAR I2R), many parameters are developed during a model’s training, but aren’t actually critical to its overall performance. Hence, to streamline a model, researchers often ‘prune’ it, identifying and removing redundant parameters while maintaining its capabilities.

Conventional pruning generally relies on post-training methods that apply simple rules, such as deleting parameters with the smallest weights in a layer. While such methods are easy to implement, Xu noted that they often failed to account for how changes in a specific layer might negatively impact the rest of the model.

In a recent work, Xu and A*STAR I2R colleagues including Principal Scientist Min Wu and former Principal Scientist Xiaoli Li, worked with collaborators at Nanyang Technological University, Singapore, to address two optimisation challenges in model pruning: finding the link between specific parameters and overall model performance, and minimising the ‘cost’ of the pruning process itself.

Drawing inspiration from rate-distortion theory—a concept in signal processing traditionally used to find the best balance between data compression and quality in image and sound files—Xu and his team developed a holistic pruning method that focused on how much the model’s final output changes after pruning.

“Instead of guessing which layers can tolerate more pruning, we directly measure how pruning affects the model’s final output, and then choose pruning levels across all layers together in a coordinated way,” Xu explained.

Putting their method to the test, the team found that it could be solved through two proposed algorithms, one of which could run very quickly, even without graphics processing units (GPUs): specialised and increasingly costly components for AI hardware. This efficiency could be critical for deployment-constrained environments where memory and power consumption are primary bottlenecks.

“Our method works during the post-model training stage and can be guided by practical targets such as computation cost,” Xu said. “This makes it easier to adapt large, high-quality models for deployment without needing specialised hardware or complex training pipelines.”

The researchers were also heartened by their success in adopting aspects of classical information theory for their pruning approach, given the relative complexity of DNNs today. “This encourages future research to borrow more ideas from the field, instead of relying mainly on trial-and-error pruning rules,” Xu added.

Looking ahead, Xu and the team are exploring how they could apply their pruning approach to LLMs and their visual counterparts, while also developing hardware-side optimisations to unlock their algorithms’ full potential.

The A*STAR-affiliated researchers contributing to this research are from the A*STAR Institute for Infocomm Research (A*STAR I2R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Xu, K., Wang, Z., Huang, R., Geng, X., Lin, J., et al. Efficient distortion-minimized layerwise pruning. IEEE Transactions on Pattern Analysis and Machine Intelligence 47 (10), 9298–9315 (2025). | article

About the Researchers

Kaixin Xu is a Senior Research Engineer at the A*STAR Institute for Infocomm Research (A*STAR I2R). A part-time PhD candidate at Nanyang Technological University, Singapore, under the supervision of Professor Weisi Lin, he had previously obtained his master’s degree at the National University of Singapore. His current research focus lies in Neural Network Compression (pruning, quantization, etc.), efficient deep learning and computer vision. He has published in top venues such as ECCV, ICCV, CVPR, TPAMI and TNNLS.
Min Wu received a BE degree in Computer Science from the University of Science and Technology of China (USTC) in 2006 and a PhD degree in Computer Science from Nanyang Technological University, Singapore, in 2011. He is currently a Principal Scientist at the A*STAR Institute for Infocomm Research (A*STAR I²R). His research interests include machine learning, data mining and bioinformatics. He has received several Best Paper Awards, including those from IEEE ICIEA 2022, IEEE SmartCity 2022, InCoB 2016, and DASFAA 2015. He also won the CVPR UG2+ Challenge in 2021 and the IJCAI Competition on Repeated Buyers Prediction in 2015.

This article was made for A*STAR Research by Wildtype Media Group