When learning a new language, native speakers who use the language daily are often more forgiving of your beginner's skills than the stringent evaluations of official language tests.
This discrepancy is due to the inherent subjectivity in spoken language assessments (SLA), where different raters often score the same individual variably, introducing uncertainty about their true abilities.
"SLA is inherently subjective," said Jeremy Wong, a Senior Scientist at A*STAR’s Institute for Infocomm Research (I2R). Consequently, it's common for a student to receive different scores from different teachers, Wong noted.
Traditional methods typically average scores from multiple raters into a single figure, which may not fully capture this uncertainty. "If an automatic system only predicts the proficiency score, we can't be sure how reliable that score is," Wong explained.
In response, Wong and I2R colleagues Huayun Zhang and Nancy Chen, have proposed a new framework for SLA modelling. Rather than relying on an average score, their model aimed to reflect the uncertainty of the scores. "Our goal is for the model to estimate a score that more accurately reflects the user's oral proficiency," Wong elaborated.
The innovative approach involved training a model on the distribution of scores from multiple raters, enabling it to predict not only the score but also the likelihood of rater agreement. The team used a mathematical function known as the beta density function. This helped the model handle scores that are ordinal (like a score of 2 being higher than 1) and bounded (scores fall within a fixed range such as 0-100).
To test the model's accuracy, the researchers measured the distance between the model's output and the actual distribution of scores using statistical metrics. The methodology was validated across multiple datasets, including a benchmarking dataset of recorded speech called speechocean762 and an in-house Tamil language dataset.
Comparisons with traditional scalar-score-based methods showed that the novel approach better handled rater disagreement and captured the nuances of uncertainty, thereby supporting the team’s hypothesis.
Such computational tools can revolutionise language learning, particularly through applications like automatic tutoring apps, Wong commented. These apps would allow students to practise and receive feedback on their language skills, addressing the scarcity of qualified teachers, their availability and the affordability of language education. Despite their potential, Wong emphasised that "the instruction provided by a trained teacher is irreplaceable".
This research aligns with AI Singapore's AI in Education Grand Challenge, a national initiative that focuses on enhancing mother tongue proficiency through interactive apps that simulate real examination scenarios.
The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R).