Highlights

In brief

The new software helps machine learning algorithms perform tasks more accurately by removing irrelevant data from datasets used to train them.

Copying the homework of machine learning algorithms

27 Jul 2022

A new data selector helps more accurately train machine learning algorithms by recycling datasets from other applications.

Did you ever share revision notes with friends in school? If you’ve ever had to study at the last minute for an important exam, you’ll know how useful it can be to share and compare notes with other students. As it turns out, the same is true for machine learning algorithms.

Effectively training a machine learning algorithm requires huge amounts of labelled data: raw data to be identified with meaningful labels to provide context for the algorithm. To ease the effort and cost of manually labelling data, computer scientists have developed a process called domain adaptation that allows machine learning algorithms to use existing labelled data from slightly different but still relevant data sets.

For example, a vehicle identification algorithm trained on labelled data from sunny Singapore could be used to train a vehicle identification algorithm to identify Icelandic vehicles despite the stark differences in the weather, vehicle types, and road conditions.

Impressive though this is, current domain adaptation techniques are far from perfect. They often transfer irrelevant data that hinders or even negatively impacts learning. Now, however, A*STAR researchers from the Institute for Infocomm Research (I²R) in collaboration with a team from Nanyang Technological University have invented a new data selection software that automatically chooses the most relevant data from a well-labelled source and excludes irrelevant samples that might hinder learning.

“The most exciting thing is that the superiority of the proposed method becomes more noticeable when dealing with more complex datasets. Here, it continuously outperforms all the baseline methods on almost all tasks and improves the accuracy by a large margin,” said Keyu Wu, first author of the research paper and a scientist at I²R.

Moreover, the researchers’ data selector tool can also be used for partial domain adaptation (PDA) techniques, when the target domain doesn’t need the entire data set from the source domain. This approach is more practical, as most real-world AI applications need only customised datasets to be trained. For example, a medical imaging dataset may cover five diseases, while training a customised real-world task may only require data from three of the five diseases.

While the data selector can currently be integrated into any existing domain adaptation or PDA model, the researchers still aim to improve their data selector further. “In the next two to three years, we plan to achieve better performance in both partial domain adaption and domain adaption tasks,” Wu said.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I²R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

computer science Smart Nation and Digital Economy (SNDE) machine learning algorithm A*STAR Institute for Infocomm Research (A*STAR I²R)

References

Wu, K., Wu, M., Yang, J., Chen, Z., Li, Z. et al. Deep Reinforcement Learning Boosted Partial Domain Adaptation. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence 3192-3199 (2021) | article

About the Researchers

View articles

Keyu WU

Scientist

Institute for Infocomm Research (I2R)

View articles

Keyu Wu is a Scientist at the Institute for Infocomm Research, A*STAR, Singapore. She obtained her B.Eng Degree from the National University of Singapore and her Ph.D. degree from Nanyang Technological University, Singapore. Her research interests include reinforcement learning, transfer learning, deep learning, autonomous navigation, path planning, trajectory generation, and more.

View articles

Zhenghua Chen

Scientist and Lab Head

A*STAR Institute for Infocomm Research (A*STAR I2R)

View articles

Zhenghua Chen earned his BEng degree in Mechatronics Engineering from the University of Electronic Science and Technology of China in Chengdu in 2011 and his PhD degree in Electrical and Electronic Engineering from Nanyang Technological University, Singapore, in 2017. He is now a Scientist and Lab Head at the A*STAR Institute for Infocomm Research (A*STAR I2R), and an Early Career Investigator at the Centre for Frontier AI Research (CFAR). He has received numerous awards, including first place at the CVPR 2021 UG2+ Challenge, the A*STAR Career Development Award, first runner-up at the IEEE VCIP 2020 Grand Challenge, and best paper at IEEE ICIEA and IEEE SmartCity, both in 2022. He serves as Associate Editor for several IEEE and Springer journals. Chen is the Chair of the IEEE Sensors Council Singapore Chapter and an IEEE Senior Member. His research focuses on data-efficient and model-efficient learning, with applications in smart cities and smart manufacturing.

Highlights

Copying the homework of machine learning algorithms

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

About the Researchers

Keyu WU

Zhenghua Chen

This article was made for A*STAR Research by Wildtype Media Group

Related Articles

Tackling tugboat timing woes

Spongy sensors for a human touch

SEA-ing into machines’ future life

Get the PDF deliveredto your inbox.

Get the PDF deliveredto your inbox.

Join our mailing list

Get the PDF delivered
to your inbox.

Get the PDF delivered
to your inbox.