Highlights

In brief

A new training method known as cross-entropy regularised policy gradient (CE-PG) enhances multirobot performance in unpredictable environments by ensuring each robot consistently follows its assigned role, offering potential advancements in fields like search and rescue.

Robot teams chart new paths

23 Sep 2024

A new training strategy enables robots to efficiently collaborate and learn in dynamic environments, improving their performance in real-world tasks.

A group of friends sit around a table to piece together a jigsaw puzzle, but imagine if the image on the puzzle changed every time someone made a move. This is reminiscent of what happens in multirobot systems—where groups of robots try to learn and adapt to their environment simultaneously.

This phenomenon, known as 'nonstationarity', refers to the challenge each robot faces in learning from its surroundings to make better decisions. As robots learn and modify their actions, their collective impact changes the robots’ environment unpredictably.

Hongliang Guo, a Scientist at A*STAR’s Institute for Infocomm Research (I²R), painted a picture of the complications nonstationarity causes: “In the worst-case scenario, although the robots have visited every part of an environment, a moving target in that environment may still not be detected.”

Robots often rely on traditional learning methods such as deep Q-networks and policy gradient methods, which excel in static and predictable environments. However, Guo explained that these methods face challenges in dynamic settings because they assume stable conditions while robots are in the process of learning to navigate and complete tasks.

To counter this, Guo and researchers from the University of Electronic Science and Technology of China; and Massachusetts Institute of Technology, US; proposed a solution involving a rule called a cross-entropy regularisation policy gradient (CE-PG). This strategy helps robots in a multirobot system spread out and learn more effectively in variable environments, encouraging them not to cluster in one place but to explore different areas.

Initially, robots were trained centrally with shared information but executed their tasks independently using the learned policies. This setup avoids real-time policy adjustments that can destabilise learning. Subsequently, CE-PG aided in dispersing the robots, ensuring coverage of different areas during tasks.

Through a series of test simulations and real-world experiments, the researchers showed that the CE-PG approach successfully overcomes the issue of unpredictable changes by ensuring that the robots stick to their initial strategies during tasks. In all cases, the CE-PG scheme found the moving target, outperforming or matching standard policy gradient and deep Q-network techniques, especially in maintaining robustness against individual robot failures.

This method can significantly enhance the efficiency and reliability of multirobot systems in real-world applications such as search and rescue, surveillance and exploration. Guo suggested some practical applications: “Multirobot search teams could look for a missing child in a mall environment, or for lost luggage at the airport.”

The decentralised execution aspect of the team's method also means it scales well with the number of robots involved, potentially enabling larger and more complex multirobot operations. “Our next step is to devise CE-PG+, which is applicable to ‘unknown’ environments, without prior topological information,” said Guo.

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I²R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

robotics machine learning A*STAR Institute for Infocomm Research (A*STAR I²R) algorithms Smart Nation and Digital Economy (SNDE) Multirobot

References

Guo, H., Liu, Z., Shi, R., Yau, W.-Y. and Rus, D. Cross-entropy regularized policy gradient for multirobot nonadversarial moving target search. IEEE Transactions on Robotics 39 (4), 2569-2584 (2023). | article

About the Researcher

View articles

Hongliang Guo

Scientist

Institute for Infocomm Research (I2R)

View articles

Hongliang Guo received his Bachelor of Engineering in Dynamic Engineering and a Master of Engineering in Dynamic Control at the Beijing Institute of Technology, China, in 2005 and 2007 respectively. He holds a PhD degree in Electrical and Computer Engineering from the Stevens Institute of Technology, USA. His research interests include planning and learning under uncertainties. He served as an Associate professor at the University of Electronics Science and Technology of China, from 2016 to 2020. In 2021, he joined the Institute of Infocomm Research (I2R) in A*STAR.

Highlights

Robot teams chart new paths

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

About the Researcher

Hongliang Guo

This article was made for A*STAR Research by Wildtype Media Group

Related Articles

Seeing the unknown

Rippling new colours from quantum emitters

A tireless friend in dementia care

Get the PDF deliveredto your inbox.

Get the PDF deliveredto your inbox.

Join our mailing list

Get the PDF delivered
to your inbox.

Get the PDF delivered
to your inbox.