In brief

The model uses a variational autoencoder that feeds into a recurrent neural network to identify discourse roles, achieving best-in-class emotion recognition across three public datasets.

© Shutterstock

Artificial intelligence that knows how you feel

14 Mar 2023

Researchers build a state-of-the-art computational model that can identify emotional cues in real-life conversations

We don’t solely rely on words to communicate. Body language and tone also contribute to how thoughts and feelings are expressed, and these vary from one situation to another.

With so many intertwining factors, it’s not surprising that computers struggle to decipher our emotions in a conversation. Communication platforms powered by artificial intelligence (AI), such as customer service chatbots, need to read into our feelings to create a realistic and rewarding experience.

However, we’re not likely to pour our hearts out to chatbots—conversations tend to consist of short, concise responses without words or phrases that bear emotional cues. In fact, this happens in human conversation as well. “In these cases, AI models must rely on various contextual clues to recognise emotions even when they are not explicitly stated,” notes Donovan Ong, a senior research engineer at A*STAR’s Institute for Infocomm Research (I2R).

The same series of words can convey very different feelings, said Ong, providing the example sentence of “We’ve got to say it to him.” In one sense, the statement could relay frustration from someone disagreeing with a prior statement. Alternatively, it might have a sympathetic undertone coming from compassion. With these nuances in mind, Ong and other colleagues from the Natural Language Processing (NLP) Group led by Jian Su, principal investigator of the research, created a new AI model that picks up on textual discourse cues in conversations to predict human emotions.

Several challenges stood in the way. Existing conversational datasets for training machine learning platforms don’t have discourse role labels (i.e. identifiers of whether a phrase is a question or if the person is responding to a previous comment). Furthermore, discourse roles are dynamic and context-dependent. The platform would need to understand dependencies such as an answer discourse role following a question role.

The team used an integrated-model approach to incorporate these variables into existing AI models. In order to label utterances with discourse roles, they deployed a variational autoencoder (VAE) that reads consecutive lines of a conversation and designates roles. To account for the sequential nature of discourse roles, the results from the VAE were fed into a recurrent neural network (RNN), a model built to model temporal relationship.

The results from a validation test exceeded the researchers’ expectations. “Our model achieved the best performance across three public datasets for emotion recognition in conversations,” Ong commented, adding that this is a green light towards future industrial applications. The team’s new model will likely be applied for other language tasks such as summarisation, translation, and dialogue generation.

Ong and colleagues plan to build on the model to recognise a broader landscape of emotional cues. To achieve this, they will incorporate more detailed linguistic, audio and visual information for a more complete training data package.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!


Ong, D., Su, J., Chen, B., Luu, A.T., Narendranath, A., et al. Is Discourse Role Important for Emotion Recognition in Conversation? Proceedings of the AAAI Conference on Artificial Intelligence 36(10), 11121-11129 (2022). | article

About the Researchers

Donovan Ong is a Senior Research Engineer at the Institute for Infocomm Research (I2R), A*STAR. He is currently a technical leader of multiple industry collaboration projects as well as a researcher in an international research collaboration on deep learning-based natural language processing. His research interests include emotion recognition, sentiment analysis and document-level information extraction.
Jian Su is a Principal Scientist at the Institute for Infocomm Research (I2R), A*STAR. She is A*STAR Co-Director of DesCartes (a CNRS@CREATE program), I2R Co-Director of BIRC (Baidu I2R Research Centre) as well as NLP group leader of ALI department. She is a Principal Investigator of various NLP projects, spanning international research collaborations as well as large scale technology deployments.

This article was made for A*STAR Research by Wildtype Media Group