It takes just a split second for you to count how many people are in the elevator because human neural networks make the process of recognizing, processing and interpreting information based on visual cues seem effortless. Unsurprisingly, it is much more tedious for computers to do the same, and the science of computer vision is so much more than simply plugging a camera into a computer.
Take, for example, the task of counting the number of people at a park, based on live-streamed video footage. Computational models serve as the analytical powerhouse, allowing the computer to make predictions based on the relationship between dependent variables (in this case, the number of people) and independent variables (images of the park). While current computer vision platforms can accurately count how many joggers there are on a track, problems creep up when it has to count people sitting close together, or when some are closer to the camera than others.
“Crowd counting and age estimation are challenging because they need the machine to have a high-level global understanding of the input images,” said study first author Le Zhang, a Scientist at A*STAR’s Institute for Infocomm Research (I2R). “For crowd counting, significant hurdles occur due to occlusions, scale variations and diverse crowd distributions. As for age estimation, one major difficulty is that different people age in different ways.”
To better ‘teach’ computers to accurately identify and classify objects from input images, Zhang and an international team of researchers have come up with a new computer vision training regime called Deep Negative Correlation Learning, or DNCL. This method first divides large training tasks into bite-sized sub-problems. Then, unlike former platform iterations, DNCL trains the system to recognize large pools of regression relationships at a time.
The researchers validated the system in a range of diverse and challenging real-world applications with exciting results. “We report four real-world applications in the paper: crowd counting, age estimation, image super-resolution and apparent personality analysis,” said Zhang. “Our method also inspires some interesting follow-up studies for low-level computer vision tasks.”
As the authors describe it, their ‘divide and conquer’ approach is a huge advancement in terms of efficiency, as it mimics an ensemble-learning system without increasing the number of parameters, and yields superior results such as super-resolution images with sharper edges.
“We are now generalizing this work to the classification scenario where the output targets the discrete category labels,” added Zhang, with future work set to tackle even more challenging computer vision applications.
The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R) and the Institute of High Performance Computing (IHPC).