Highlights

Speaking in song

26 May 2015

Sing-a-long software developed at A*STAR brings sweet melody to any cacophonous cry

A song synthesis software that brings out the natural beauty in off-key singing or speaking was introduced to Singaporeans in 2013 through ‘Sing for Singapore’, part of the National Day Parade 2013 mobile application.

Whether you give it your best — or worst — effort, I²R Speech2Singing technology will make you sound like the melodious singer you’ve always wanted to be. The voice synthesis software developed by A*STAR researchers is the first to deliver high-quality singing automatically, while still preserving the original character of your natural voice.

“Many people like singing but they lack the skills to do so,” says Minghui Dong who led the research at the A*STAR Institute for Infocomm Research. “We want to use our technology to help the average person sing well.”

Speech consists of three key elements — content, prosody and timbre. The content is conveyed using words, the prosody — or melody in the case of singing — is expressed through rhythm and pitch, but the timbre is that distinctive quality that makes a banjo sound different from a trumpet, and one singer’s voice from another’s. I²R Speech2Singing works by polishing melody, while retaining the original content and timbre of a sound.

Existing technologies that focus on correcting melody try to align off-tune sounds either to the closest note on the musical scale or to the exact note in the original score. The former works well for professional singers who may only be slightly out of tune, but cannot fix drastically off-key singing or simply reading out loud. The latter is better at correcting discordant tunes, but ignores many other aspects of melody such as vibrato and vowel-stretching.

Instead, I²R Speech2Singing uses recordings by professional singers as templates against which to correct the melody of a singing voice or convert a speaking voice into a singing one. The software detects the timing of each phonetic sound using speech recognition technology, and then stretches and compresses the duration of the signal using voice conversion technology to match the rhythm to a professional singer’s. A speech synthesizer then combines the time-corrected voice with pitch data and background music to produce a beautiful solo.

Minghui Dong (second from left) and the Human Language Technology Department team.

“When we compared the output with other applications in the market and in research, we realized that our software generated much better voice quality,” says Dong.

Singaporeans were first introduced to the software in 2013 through ‘Sing for Singapore’, part of the National Day Parade 2013 mobile application (see image). And in 2014, I²R Speech2Singing won the award for best Show & Tell contribution at INTERSPEECH, a major global venue for research on the science and technology of speech communication.

Dong and his team are now working to improve the accessibility of the software and to add a feature that allows users to tune their singing as they wish.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research. More information about the group’s research can be found at the Human Language Technology Department webpage.

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!

synthesis speech processing singing dynamic time warping voice recognition language A*STAR Institute for Infocomm Research (A*STAR I²R)

References

Dong, M., Lee, S. W., Li, H., Chan, P., Peng, X., Ehnes, J. W. & Huang, D. I²R Speech2Singing Perfects Everyone’s Singing. INTERSPEECH 2014, 2148-2149 (2014). | article

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature

Training AI to plan step by step

26 Jun 2026

Just by seeing the starting state and end goal, AI models could predict how to complete a task by filling in the unknown steps and focusing only on the actions that matter.

RIE2030: Turning the page

14 May 2026

As a new five-year phase of Research, Innovation and Enterprise takes off across the nation, A*STAR leaders present the strategic throughlines and shifts through which the agency will advance national priorities in health, economy, sustainability and future technologies.

Highlights

Speaking in song

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!

References

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature

Related Articles

Training AI to plan step by step

RIE2030: Turning the page

Tuning AI to local news beats

Get the PDF deliveredto your inbox.

Get the PDF deliveredto your inbox.

Join our mailing list

Get the PDF delivered
to your inbox.

Get the PDF delivered
to your inbox.