Highlights

In brief

The EEG-Video Emotion-based Summarisation (EVES) model, trained on conjoined brain signal-video data, generated emotion-evoking video summaries that matched or surpassed existing video summarisation models.

© Shutterstock

Teaching computers to tug heartstrings

4 Aug 2023

A new automated video summarisation model uses viewers’ brain signals to extract reel highlights that evoke emotions.

Endlessly scrolling for what to watch on streaming platforms? With the glut of exciting video content online today, finding the perfect gem can be daunting. To decide what we feel like watching, we often look for short video summaries: a dramatic action-packed trailer, a funny montage, or an emotional cliffhanger that leaves us wanting more.

To grab viewers' attention, most video previews feature carefully-selected snippets of standout moments. These summaries are often put together by human editors to sum up the content’s narrative or emotional beats.

Software engineers have built algorithms to try and automate the video summarisation process. However, many machine learning (ML) models either need large, thoroughly-labelled datasets, or struggle to pick video segments from unlabelled data that would interest human viewers.

“It’s hard to define ‘interesting’ elements in a video; a segment that interests one person might bore another,” said Wai Cheong Lew, an A*STAR Postgraduate Scholarship recipient in computer science.

To build an ML model that can better account for human emotional responses when creating video summaries, Lew and colleagues at A*STAR’s Institute for Infocomm Research (I2R), Nanyang Technological University (NTU), and Singapore Management University (SMU) proposed a novel training method. Instead of training video datasets with manual annotations on ‘interesting’ segments—which can be subjective and costly—they hypothesised that a dataset of viewers’ brain signals while watching videos may serve the same purpose.

Using a publicly available electroencephalography (EEG) dataset, the team non-invasively measured brain signals from volunteers which reflected their emotional responses to specific scenes. After linking the EEG readings to the video sections that induced them, they were then fed to an unsupervised machine learning model as a training dataset.

“EEG signals can be an alternative to manually-created labels as another form of human annotation,” explained Lew.

The study was a collaboration with Joo-Hwee Lim, I2R Senior Principal Scientist III; Kai Keng Ang, I2R Senior Principal Scientist I; and colleagues from NTU and the SMU.

The diversity of viewer tastes and preferences created some subjective bias in the researchers’ training datasets. “We found it challenging to introduce EEG signals into the reinforcement learning framework, as they tend to be noisy and can bring disturbance into the training process, resulting in ineffective summaries,” said Lew.

To overcome this, the researchers implemented a deep learning model called the EEG Linear Attention Network (ELAN). ELAN draws connections between signals at different timepoints and different brain areas, selectively considering only those consistent across all volunteers.

Combining ELAN with standard ML models used in video processing, the researchers built the EEG-Video Emotion-based Summarization (EVES) model. EVES allows accurately extracted higher-level meanings for video summaries by considering emotion-evoking scenes, resulting in a better correlation with emotional content. In statistical tests against other published models, the team found EVES both outperformed traditional unsupervised models and matched the performance of supervised models trained on painstakingly labelled data.

The team also tested EVES-generated video summaries on a cohort of viewers. In terms of coherence and emotional content, the audience reported a preference for EVES summaries over those from other state-of-the-art models.

Lew hopes that this and other breakthroughs in the field of automated video summarisation will spur a demand for EEG-video conjoined datasets.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Lew, W.-C.L., Wang, D., Ang, K.K., Lim, J.-H., Quek, C., et al. EEG-video emotion-based summarization: Learning with EEG auxiliary signals. IEEE Transactions on Affective Computing 13 (4), 1827-1839 (2022). | article

About the Researchers

View articles

Wai-Cheong Lincoln Lew

A*STAR Postgraduate Scholarship recipient

Institute for Infocomm Research (I2R)
Wai-Cheong Lincoln Lew received his BSc (First Class Honours) degree in Physics from Nanyang Technological University (NTU), Singapore. He is a recipient of the A*STAR Postgraduate Scholarship award and is currently pursuing his PhD degree in Computer Science at NTU.
Kai Keng Ang is currently the Leader of the Signal Processing Group and a Senior Principal Scientist I with the A*STAR Institute for Infocomm Research (A*STAR I2R). He is also an Adjunct Associate Professor at the School of Computer Science and Engineering, Nanyang Technological University, Singapore. His current research interests include brain-computer interfaces, computational intelligence, machine learning, pattern recognition and signal processing.
View articles

Joo-Hwee Lim

Senior Principal Scientist III

Institute for Infocomm Research (I2R)
Joo-Hwee Lim is currently a Senior Principal Scientist III and the Head of the Visual Intelligence Unit at A*STAR’s Institute for Infocomm Research (I2R) and an Adjunct Professor at the School of Computer Engineering, Nanyang Technological University, Singapore. He received his BSc and MSc research degrees in Computer Science from the National University of Singapore and his PhD degree in Computer Science & Engineering from the University of New South Wales, Australia. He joined I2R in October 1990. His research experience includes connectionist expert systems, neural-fuzzy systems, handwritten recognition, multi-agent systems, content-based image retrieval, scene/object recognition and medical image analysis.

This article was made for A*STAR Research by Wildtype Media Group