Rapid advances in automation have made many wonder if robots would ever replace humans in industrial settings. For now, this dystopian scenario appears to be unlikely. While robots can automatically perform repetitive tasks, they still rely heavily on human input for more precise pick-and-place operations.
Such robot-human teamwork is exemplified by a vision-based control method called “visual servoing”, where a user guides a robot’s limb and gripper in performing delicate manipulation tasks. Traditional visual servoing techniques require humans to manually select features to use as feedback for the movement, making the process labor-intensive and limiting the robots to operate only in environments they have been pre-programmed to perform in.
Fortunately, algorithms called deep neural networks could eliminate human input, enabling robots to program themselves. However, such algorithms require vast amounts of real-life training data and only work on limited functions, making them unprepared to fully take over from human controllers.
To address the functional and data limitations of current systems, En Yen Puang, a Research Engineer at A*STAR’s Institute for Infocomm Research (I2R), together with colleagues Keng Peng Tee and Wei Jing developed a novel keypoint-based visual servoing framework called KOVIS.
“KOVIS stands out from other visual servoing frameworks because of how easy, fast and labor-free it is when deployed on new tasks,” explained Puang. Compared to its predecessors, KOVIS boasts a higher efficiency, using a deep auto-encoder algorithm to learn and encode visual features without any human input.
Uniquely, the system represents objects as keypoints based only on essential geometric information. This feature makes KOVIS quicker and easier to deploy in new environments as it eliminates the appearance variations of an object in different settings. Consequently, KOVIS can quickly pick up the simple ‘peg-and-hole’ relationship between the robot’s gripper and the target object.
Another major benefit of KOVIS is how little real-life training data the system needs to perform these tasks, with the framework trained entirely using synthetic images captured in simulated conditions. While these objects may appear different in reality, KOVIS works around this by generalizing what it learns so that it can recognize the same objects in different environments.
When put to the test, KOVIS rose to the challenge. Despite being trained on only synthetic data, the system successfully performed real-world manipulation tasks like gripping a mug by its handle and inserting a peg and screw into a designated hole with a remarkable 90 percent success rate.
“Currently, KOVIS is being used in Collab-AI, a research project that aims to advance developments for safer and more efficient human-robot collaboration in production and manufacturing settings,” Puang said.
In the future, the researchers hope to expand the applications of KOVIS from simple manipulation tasks to precise routine maneuvers such as the docking and landing of mobile robots.
The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R) and Institute for High Performance Computing (IHPC).