Whether watching a film at the cinema or binge-watching a series on Netflix, you’ve probably tried to guess how the story ends. Our ability to make predictions extends beyond our TV habits: think of how we correctly step out of the way of an oncoming pedestrian, or how tennis players estimate the direction of the ball from their opponent’s movements.
While recognizing and reacting to the various future actions of other humans and objects before they happen comes naturally to us, teaching computers to do so is not as simple. However, this ability, also known as action anticipation, is critical for technologies that are involved in human-machine interactions such as virtual assistants and self-driving cars.
“Action anticipation models should be able to account for future uncertainties. For the same observations, there are numerous plausible futures and these models should be able to predict all of them accurately,” said Basura Fernando, a scientist at A*STAR’s Institute of High Performance Computing (IHPC) and recipient of an A*STAR scholarship.
Together with Samitha Herath, Research Fellow at Monash University in Australia, Fernando developed a framework for an action anticipation model that correlates past observations with future actions and then uses these correlations to extract predictions from current observations. Although the concept of linking past and future is not new, the researchers’ framework incorporates algorithms that maximize the correlations and look for the strongest links between observed representations and future behaviors.
To achieve this, Fernando’s team developed new similarity measures, functions that quantify how related two objects are. For example, one function checks the similarity of cross-correlation properties between all features or dimensions, rather than just between the same features. Another evaluates the covariance of observed and future events in the data set, analyzing the relationship between their movements or trends.
“These similarity measures look at higher-order information within a vector space,” Fernando explained. “Correct similarity measures help ensure that the computer model learns effective representations of human behaviors through videos, allowing us to maximize the correlations.”
The researchers’ model was first trained using videos of various scenes, extracting correlation rules between the past and future events shown. When tested on another set of videos, their model was able to apply these correlations to predict future actions with higher accuracy than models using other similarity measures.
With even better algorithms in mind, the researchers aim to further research the theoretical properties of their new similarity measures and create a framework that covers risk calculations in action anticipation.
The A*STAR-affiliated researchers contributing to this research are from the Institute of High Performance Computing (IHPC).