Highlights

In brief

The P5RR-MaxPoolNMS algorithm helps computer vision platforms quickly remove ‘ghost’ objects tracked on real-time footage through parallel processing.

© Unsplash

Ghost-buster algorithm keeps watch

28 Nov 2022

Computer scientists develop an algorithm that tracks moving objects faster and more accurately than current platforms.

You’re waiting to meet a friend in a mall during the busy lunch hour. Scanning the crowds, you spot them walking towards you and wave. Visually identifying and tracking moving objects comes almost effortlessly to us, but it’s far more complicated for computer vision platforms, which aim to help computers ‘see’ the world as we do.

When fed with visual data such as photos or videos, such platforms use algorithms to group clusters of pixels and identify them as a single object. However, following moving objects in real-world settings is a formidable task—an object’s shape and size may shift as it moves, and its colours may vary depending on lighting conditions.

To keep up, computer vision algorithms must not only be fast enough to keep up with real-time video framerates, but accurate enough to lock the right target within their crosshairs.

Some advanced computer vision platforms use convolutional neural networks (CNNs) for object detection. These networks use three-dimensional neural patterns modelled after the visual cortex in animals, allowing them to pick out multiple identifying features to recognise and track a given object. However, CNNs are so powerful that they can start seeing things, sometimes mistakenly labelling ‘ghost’ objects that don’t exist.

To fix this, CNNs use non-maximal suppression (NMS) algorithms that double-check these labels. Unfortunately, that added post-processing layer can significantly slow down the whole process.

Computer scientists have proposed parallelisation—running multiple processes simultaneously—as a potential solution, although they say it’s not a perfect fix. “When processes are running parallel, they can greatly save execution time at the cost of hardware resources," said Bin Zhao, a Senior Research Engineer at A*STAR’s Institute of Microelectronics (IME). "The question is how to reduce those costs.”

Nonetheless, Zhao together with Jie Lin, a Principal Investigator at A*STAR’s Institute for Infocomm Research (I2R), and colleagues hypothesised that parallel processes were still key to boosting CNN processing times in next-generation NMS approaches.

The researchers focused on enhancing MaxPoolNMS, a parallelisable algorithm they had previously developed. The result, dubbed PSRR-MaxpoolNMS, was a new variant that outperformed its predecessors in speed and detection accuracy while being versatile enough to use in any CNN-based object detector.

The team’s new NMS was improved by its ability to combine overlapping ghost objects and process them in batches, thereby rapidly eliminating false objects from detection windows. PSRR-MaxpoolNMS also assigns labels differently compared to its predecessors by drawing boxes directly over target objects. As the algorithm processes the images, boxes move or shrink as ghost objects are identified.

“Compared to previous versions, PSRR-MaxpoolNMS has a reduced number of checking points for relatively large checking windows or anchor boxes,” Zhao said, adding that this supports faster and smoother CNN runs. The team is currently working on reducing the hardware overhead of a proposed Parallel Maxpool for PSRR-MaxpoolNMS.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R) and the Institute of Microelectronics (IME).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Zhang, T., Lin, J., Hu, P., Zhao, B. and Aly, M.M.S. PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15835-15843 (2021). | article

About the Researcher

View articles

Bin Zhao

Principal Research Engineer

Institute of Microelectronics (IME)
Bin Zhao received his BS degree from Peking University in 1990, where he majored in Microelectronics. He obtained his MS and MEng from the Chinese Academy of Sciences and NUS in 1993 and 1998 respectively. Zhao joined A*STAR’s Institute of Microelectronics (IME) in July 2000 where he is now a principal research engineer. He has been involved as a digital IC designer in many public-funded and industry projects, such as WLAN Baseband, ONFIG ONU, RFID tag/reader, EHSII, PILLCAM, ZigBee Baseband, PCRAM-II, Network-on-chip(NOC), Delta-sigma PLL, High-speed DAC, and Neural-network Accelerator. His current interests are low-power digital IC design and the implementation of 3D chiplet for neural-network accelerators.

This article was made for A*STAR Research by Wildtype Media Group