It seems like we’ve reached a pivotal moment in bridging the gap between complex human communication and computational understanding. Advances in Natural Language Processing (NLP) have spawned innovative platforms such as ChatGPT, which facilitate remarkably natural and seamless human-computer interactions.
At the heart of these NLP breakthroughs is a transformer-based model known for its ‘multi-head attention mechanism’. Zhengyuan Liu, a Lead Research Engineer at A*STAR’s Institute for Infocomm Research (I2R), explained that this works much like how our brain simultaneously processes different types of information—transformers focus on different parts of the input data at the same time, significantly enhancing context understanding and task efficiency.
However, in task-specific modelling like dialogue summarisation, the attention heads are not uniformly used. Liu, alongside A*STAR Senior Principal Scientist, Nancy Chen, developed a novel technique for repurposing these underused heads to infuse new capabilities into transformers and bump up their computational efficiencies.
“Redundant attention heads can be replaced with featured weights and it’s much more computationally efficient than introducing additional neural components,” explained Liu.
Their method involved training a base model to identify the attentive parts that were not contributing much during the task of summarising conversations. They then improved these underperforming parts by giving them additional information about how personal named entities in a conversation refer to each other; this helped the model to better understand the flow and context of the dialogue.
The researchers then experimented with a benchmark dataset and found that their enhanced transformer model not only improved upon the base model but also held its own against state-of-the-art models, all while being more computationally economical. Additionally, of the coreference information integration techniques tested, the nearest-neighbour approach proved superior.
In practical terms, this can lead to more effective summarisation of legal documents, medical records or customer service interactions, where both clarity and context are crucial.
“There are many directions that we are exploring,” said Liu, speaking on next steps. These include investigating the effectiveness of attention mechanisms in multiple modalities such as vision, speech and text, as well as improving the accountability of these models, which are crucial for sensitive applications such as healthcare.
The A*STAR-affiliated researchers contributing to this research are from the A*STAR’s Institute for Infocomm Research (I2R).