Highlights

In brief

SeaEval, built with 28 datasets covering seven languages, evaluates multilingual AI models with tailored metrics to address biases and inconsistencies, enabling more accurate and culturally aware global communication.

© Freepik

Dissolving cross-cultural boundaries

27 Feb 2025

A new benchmark for multilingual artificial intelligence tackles biases and inconsistencies to foster more inclusive global communication.

At global events like the G20 summit, international representatives face a daunting challenge: effective communication. Capturing the subtleties of various languages, cultural norms and reasoning styles can make discussions cumbersome, even with the best interpreters.

Artificial intelligence (AI) researchers believe that advancements in cross-lingual consistency and cultural reasoning could transform these interactions. Bin Wang and Zhengyuan Liu from the A*STAR Institute for Infocomm Research (A*STAR I2R) emphasised the revolutionary potential of such models.

“AI systems that bridge language and cultural gaps can make it easier for people from different countries and cultures to communicate, share ideas and work together,” said Wang and Liu. “This helps create a more connected and inclusive world where everyone can contribute and feel understood.”

In education, for example, AI could deliver tailored learning experiences by adapting content to students’ cultural and linguistic needs. In content creation, it could ensure localisation by making messages resonate with target audiences, bridge divides and expand global reach.

However, the current landscape of AI language models presents significant limitations. They are predominantly English-centric, reflecting the developers’ perspectives and resource availability. This bias leads to inconsistent performance when the same question is asked in different languages. “Multilingual models often fail to transfer knowledge seamlessly across languages,” noted Wang and Liu.

To bridge this gap, the team developed SeaEval, a comprehensive benchmark to assess multilingual AI models. It incorporated 28 datasets, including seven new ones designed specifically for cultural reasoning and cross-lingual consistency.

SeaEval evaluates AI models using a range of metrics, including accuracy, cross-lingual consistency and instruction sensitivity. By identifying gaps in handling linguistic and cultural nuances, it aims to guide improvements in multilingual AI.

The team’s findings brought several critical challenges to light: even top-performing models like GPT-4 show significant drops—over 10 percent—in performance when switching from English to other languages like Vietnamese. Models also remained sensitive to label arrangement and paraphrased instructions, revealing biases that compromise stability. Lastly, cultural reasoning capabilities remain underdeveloped even in advanced systems.

Acknowledging the support of the National Supercomputing Centre (NSCC) and the National Research Foundation, Wang and Liu said that SeaEval is a crucial step towards building AI systems capable of connecting diverse global communities more effectively.

The A*STAR-affiliated researchers contributing to this research are from the A*STAR Institute for Infocomm Research (A*STAR I2R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Wang, B., Liu, Z., Huang, X., Jiao, F., Ding, Y., et al. SeaEval for multilingual foundation models: From cross-lingual alignment to cultural reasoning. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico. Association for Computational Linguistics 1, 370-390 (2024). | article

About the Researchers

Bin Wang is a Scientist with the A*STAR Institute for Infocomm Research (A*STAR I2R). Before that, he obtained his PhD degree from the University of Southern California, Los Angeles in 2021 and B.Eng from the University of Electronic Science and Technology of China. He was a Research Fellow with the National University of Singapore from 2021 to 2023. His research focuses on Multimodal LLM and conversational AI systems. He served the publication chair at EMNLP 2023 and was an Editorial Member for APSIPA Transactions. He has published more than 40+ academic papers in top journals and conferences including ACL, EMNLP, NAACL, ACM KDD, TNNLS and TASLP, and won multiple best-paper awards.
View articles

Zhengyuan Liu

Tech Lead, Multimodal Generative AI group

A*STAR Institute for Infocomm Research (A*STAR I2R)
Zhengyuan Liu is currently a Tech Lead in the Multimodal Generative AI group and the Asst. Head of AI for Education programme at the A*STAR Institute for Infocomm Research (A*STAR I2R). He has published over 30 research papers in top-tier AI and Natural Language Processing conferences including ACL, NAACL, EMNLP, COLING, ICASSP and INTERSPEECH. He serves as the reviewer at conferences including NeurIPS, ICLR and ACL; and journals including IEEE TASLP, ACM CSUR and Neurocomputing. He has been elected as an IEEE senior member for his significant professional achievements and won the Best Paper Award at SIGDIAL 2021, C3NLP in ACL 2024, and SUMEval in COLING 2025; and the Outstanding Paper Award at EMNLP 2023 and EMNLP 2024.
View articles

Nancy F. Chen

Senior Principal Scientist and Lead Principal Investigator

A*STAR Institute for Infocomm Research (A*STAR I2R)
Nancy F. Chen is a Senior Principal Scientist and Lead PI at the A*STAR Institute for Infocomm Research (A*STAR I²R). She heads the Multimodal Generative AI group and AI for Education programme. A serial best paper award winner, her AI research spans culture, healthcare, neuroscience, social media, education and forensics. Chen's multilingual tech led to commercial spinoffs and adoption by Singapore’s Ministry of Education. She is Program Chair for NeurIPS 2025, ICLR 2023, APSIPA Governor (2024–2026), IEEE SPS Distinguished Lecturer (2023-2024), ISCA Board Member (2021-2024), and Singapore’s 100 Women in Tech (2021). Previously, she worked at MIT Lincoln Lab during her PhD studies at MIT and Harvard, US.
View articles

AiTi Aw

Head, Aural and Language Intelligence Department

A*STAR Institute for Infocomm Research (A*STAR I2R)
AiTi Aw heads the Aural and Language Intelligence Department at A*STAR I²R. She spearheads capability development of Machine Translation and Multilingual technology in local and Southeast Asian languages. Her team research in Audio, Speech and Language technology has led to commercial spinoffs, and private and government deployment. She is also the co-PI of the National Multimodal Large Language project to develop Singapore’s research and engineering capabilities in multi-modal Large Language Models.

This article was made for A*STAR Research by Wildtype Media Group