Highlights

In brief

By integrating event-centric clustering, data sharpening and localisation techniques, an AI pipeline can better extract local facts and cultural nuances from regional news sources, producing more contextually grounded summaries with reduced bias compared to global models.

Photo by daddyyy | Shutterstock

Tuning AI to local news beats

7 May 2026

Bigger models are not always better, as A*STAR researchers show how a targeted AI model can beat global, general-purpose models in summarising multilingual regional news.

In the newsroom, headlines are constructed to distil the gist of a story in one glance, appealing to readers with limited time on their hands. However, relying solely on headlines can lead to a partial or skewed understanding of events. A similar limitation appears in general-purpose large language models (LLMs), such as GPT-4, particularly when summarising vast troves of multilingual news articles from local sources.

“Like someone who only listens to the loudest voices in the room, LLMs often ignore crucial, nuanced details in favour of the most repeated statements,” said Longyin Zhang, a Research Scientist at the A*STAR Institute for Infocomm Research (A*STAR I²R). “They also struggle to maintain accuracy regarding local entities and cultural nuances, sometimes confusing timelines and making up facts based on outdated information.”

Zhang and colleagues designed CLUST-Multi-lingual, Cross-lingual and Multi-documents Summarisation (CLUST-McMs), a two-stage artificial intelligence (AI) pipeline designed to produce more accurate and context-aware summaries of multilingual regional news. In the first stage, CLUST-McMs dynamically categorises articles based on specific events—such as an election period or the passage of a new law—rather than broad topical groupings typically used by general-purpose LLMs.

The second stage applies a data sharpening technique to guide the summarisation process. By balancing the volume and diversity of input information, the model filters out repetitive content and prioritises dense, information-rich sentences, helping to reduce bias in the final summary.

The team also introduced a localisation step to better capture cultural and contextual nuances, akin to the judgement of a local editor. “We train it through specific question-answering tasks to strictly cite facts and timestamps directly from the local source texts,” Zhang explained.

Using a specially curated dataset of news from Southeast Asia, the researchers reported that CLUST-McMs significantly outperformed GPT-4, effectively synthesising overlapping articles across multiple languages into a single, concise English-language summary. Based on three evaluation metrics, their smaller, targeted model delivered more accurate coverage and stronger fidelity to the original sources, highlighting the importance of smart data sharpening and localisation over relying solely on large, general-purpose models.

Moving forward, the team hopes to expand their work and localise multimodal models to understand regional news shared not just in written form, but also in audiovisual formats.

“The AI community needs to shift its focus from merely scaling up model sizes to making AI highly faithful to real-world facts and deeply culturally aware in localised contexts,” said Zhang. “Our goal is to mitigate the cultural biases inherently embedded in global models, ensuring the AI correctly interprets local visual cues, dialects and nuances.”

The A*STAR-affiliated researchers contributing to this research are from the A*STAR Institute for Infocomm Research (A*STAR I²R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

translation natural language processing A*STAR Institute for Infocomm Research (A*STAR I²R) Smart Nation and Digital Economy (SNDE) large language models multidocument summarisation

References

Zhang, L., Zou, B. and Aw, A.T. Enhancing event-centric news cluster summarization via data sharpening and localization insights. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics 1, 16412–16426 (2025). | article

About the Researchers

View articles

Longyin Zhang

Research Scientist

A*STAR Institute for Infocomm Research (A*STAR I2R)

View articles

Longyin Zhang is a Research Scientist in the Aural and Language Intelligence (ALI) department at the A*STAR Institute for Infocomm Research (A*STAR I²R). He received his Master’s and PhD degrees from the Natural Language Processing Research Group at Soochow University, China. With over 8 years of research experience in natural language processing, Zhang specialises in discourse analysis, entailment reasoning and multi-document summarisation. He has authored more than 20 research papers in leading AI conferences and journals, including ACL, ACM MM, and IEEE TASLP. His current research focuses on Southeast Asia-centric omni models within the broader field of multimodal large language models (MLLMs). He is a key contributor to Singapore’s National Multimodal Large Language Model Programme (MERaLiON), where he works on advancing visual recognition and reasoning. His interests include interpretable, event-centric summarisation frameworks and the development of multimodal AI for real-world, regional applications.

View articles

Ai Ti Aw

Head, Aural and Language Intelligence Department

A*STAR Institute for Infocomm Research (A*STAR I2R)

View articles

Ai Ti Aw is the Head of the Aural and Language Intelligence Department at the A*STAR Institute for Infocomm Research (A*STAR I²R), where she spearheads capability development of machine translation and multilingual technology in local and Southeast Asian languages. She is also the co-PI of the National Multimodal LLM Programme to develop Singapore’s research and engineering capabilities in the field. Under this programme, Aw currently leads the MERaLiON, where she oversees the development of models designed for Singapore’s and the region’s diverse cultural and linguistic context. A pioneer in Southeast Asian natural language processing since the late 1990s, Aw has helped position Singapore at the forefront of machine translation and language processing, with her teams’ innovations earning awards such as the Firefly Awards, MCI IDEA! Award, ASEAN Outstanding Engineering Achievement Award, and the President’s Technology Award.

This article was made for A*STAR Research by Wildtype Media Group

RIE2030: Turning the page

14 May 2026

As a new five-year phase of Research, Innovation and Enterprise takes off across the nation, A*STAR leaders present the strategic throughlines and shifts through which the agency will advance national priorities in health, economy, sustainability and future technologies.

Highlights

Tuning AI to local news beats

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

About the Researchers

Longyin Zhang

Ai Ti Aw

This article was made for A*STAR Research by Wildtype Media Group

Related Articles

RIE2030: Turning the page

Encrypted but not invisible

Sharpening shears for AI pruning

Get the PDF deliveredto your inbox.

Get the PDF deliveredto your inbox.

Join our mailing list

Get the PDF delivered
to your inbox.

Get the PDF delivered
to your inbox.