You’re waiting to meet a friend in a mall during the busy lunch hour. Scanning the crowds, you spot them walking towards you and wave. Visually identifying and tracking moving objects comes almost effortlessly to us, but it’s far more complicated for computer vision platforms, which aim to help computers ‘see’ the world as we do.
When fed with visual data such as photos or videos, such platforms use algorithms to group clusters of pixels and identify them as a single object. However, following moving objects in real-world settings is a formidable task—an object’s shape and size may shift as it moves, and its colours may vary depending on lighting conditions.
To keep up, computer vision algorithms must not only be fast enough to keep up with real-time video framerates, but accurate enough to lock the right target within their crosshairs.
Some advanced computer vision platforms use convolutional neural networks (CNNs) for object detection. These networks use three-dimensional neural patterns modelled after the visual cortex in animals, allowing them to pick out multiple identifying features to recognise and track a given object. However, CNNs are so powerful that they can start seeing things, sometimes mistakenly labelling ‘ghost’ objects that don’t exist.
To fix this, CNNs use non-maximal suppression (NMS) algorithms that double-check these labels. Unfortunately, that added post-processing layer can significantly slow down the whole process.
Computer scientists have proposed parallelisation—running multiple processes simultaneously—as a potential solution, although they say it’s not a perfect fix. “When processes are running parallel, they can greatly save execution time at the cost of hardware resources," said Bin Zhao, a Senior Research Engineer at A*STAR’s Institute of Microelectronics (IME). "The question is how to reduce those costs.”
Nonetheless, Zhao together with Jie Lin, a Principal Investigator at A*STAR’s Institute for Infocomm Research (I2R), and colleagues hypothesised that parallel processes were still key to boosting CNN processing times in next-generation NMS approaches.
The researchers focused on enhancing MaxPoolNMS, a parallelisable algorithm they had previously developed. The result, dubbed PSRR-MaxpoolNMS, was a new variant that outperformed its predecessors in speed and detection accuracy while being versatile enough to use in any CNN-based object detector.
The team’s new NMS was improved by its ability to combine overlapping ghost objects and process them in batches, thereby rapidly eliminating false objects from detection windows. PSRR-MaxpoolNMS also assigns labels differently compared to its predecessors by drawing boxes directly over target objects. As the algorithm processes the images, boxes move or shrink as ghost objects are identified.
“Compared to previous versions, PSRR-MaxpoolNMS has a reduced number of checking points for relatively large checking windows or anchor boxes,” Zhao said, adding that this supports faster and smoother CNN runs. The team is currently working on reducing the hardware overhead of a proposed Parallel Maxpool for PSRR-MaxpoolNMS.
The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R) and the Institute of Microelectronics (IME).