Highlights

In brief

Using fine-grained, text-based attributes, a Partial Attribute Assignment model combines Partial Optimal Transport with curriculum learning to effectively distinguish unknown objects from backgrounds more accurately than probability-based object detection methods.

Photo by user6702303 | Freepik

Fresh AI eyes for odd objects

13 Mar 2026

A new attribute-based approach to object detection helps visual artificial intelligence models spot unknown objects more accurately and efficiently.

In the real world, artificial intelligence (AI) systems with sight-based tasks will likely deal with objects they've never seen before. A service robot might find an unfamiliar tool in a warehouse; a medical scanner might detect a rare tumour. But as it’s often impractical to teach an AI to recognise every possible object in reality, many open world object detection (OWOD) models instead rely on calculating how ‘object-like’ an unfamiliar shape is to decide if it’s more than just part of the background.

While this method can be effective, it has two limitations. “These models cannot explain how an object is detected, and often struggle when background features resemble unknown objects,” explained Muli Yang, a Scientist at the A*STAR Institute for Infocomm Research (A*STAR I²R).

In a recent collaborative work with researchers from the University of Hong Kong, Sichuan University and Xidian University, China, Yang and A*STAR I²R colleagues including Principal Investigator Hongyuan Zhu proposed an OWOD system with a different approach. Rather than ‘object-likeness’, their model asks: what attributes does this object have?

“By shifting to well-defined attributes, like 'umbrella-like' or 'transparent', our model can describe objects using rich, natural language,” said Yang. “This makes the model’s decisions more transparent and less prone to confusion, as it learns about the intrinsic properties of objects rather than statistical probabilities.”

The team’s innovation lies in how these attributes are selected. Existing models use multi-stage processes that begin with selecting attributes, then refining them, but this method can be time-consuming and prone to accumulating errors. “We needed a unified, end-to-end approach that could optimise selection and detection simultaneously, not in disjointed steps," said Yang.

For more efficient attribute selection, the team incorporated a mathematical model known as Partial Optimal Transport (POT). In conventional optimal transport, a model assumes every attribute in its database must match an object, even if some pairings don’t make sense. POT solves this problem by selecting an object’s most relevant attributes, then bringing those attributes to the next stage of computation. 

"We realised this was about transporting only a targeted fraction of attributes which truly aligned with the visual objects," said Yang.

By combining POT with curriculum learning, which trains models on progressively harder visual problems, the team developed the Partial Attribute Assignment (PASS) system. When tested on five challenging real-world datasets spanning aquatic animals, aerial photos, video game avatars, medical X-rays and surgical footage, PASS significantly outperformed state-of-the-art OWOD methods across all benchmarks, and provided a clear view of the top attributes it used to detect both familiar and unfamiliar objects.

"The diversity of our testing benchmarks shows that this method has broad applications," said Yang. “PASS is a game changer for domains where ‘unknown’ anomalies are critical, but data on them is scarce. This could include robotics and mobile manipulation; medical imaging and diagnostics; and industrial inspection and automation.”

The A*STAR-affiliated researchers contributing to this research are from the A*STAR Institute for Infocomm Research (A*STAR I²R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

Yang, M., Goenawan, G. J., Qin, H., Han, K., Peng, X., et al. Detecting open world objects via partial attribute assignment. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20318–20328 (2025). | article

About the Researchers

Muli Yang is a Scientist at the A*STAR Institute for Infocomm Research (A*STAR I2R). He received his PhD degree from Xidian University, China, in 2023, and was a visiting PhD student at Nanyang Technological University, Singapore, from 2022 to 2023. His research focuses on open-world learning and vision-language modelling. Yang has published more than 25 papers in leading conferences and journals, including CVPR, ICCV, ICLR, NeurIPS, ACL, IJCV, TPAMI and TIP, and has received a Best Paper Award and a Best Demonstration Award.
View articles

Hongyuan Zhu

Senior Scientist and Unit Lead, Satellite Sensing

A*STAR Institute for Infocomm Research (A*STAR I2R)
Hongyuan Zhu is the Unit Head and PI of Satellite Sensing at the A*STAR Institute for Infocomm Research (A*STAR I2R). He leads the Advanced Perception and Reasoning Lab, focusing on developing autonomous agents with super-large-scale multimodal sensing and reasoning capabilities for sustainability, climate/weather and defence solutions. He was selected as a Top 2% Scientist by Stanford in 2023~2025 and received the A*STAR Career Award in 2022. His team achieved first place in the Scene2Cap challenge ICCV23 and third place in the EPIC Text-Video Retrieval Challenge in CVPR22, as well as being the only Asian team to achieve 1st Prize Finalist in the KUKA Innovation Challenge 2021. Zhu has been an associate editor of Visual Computer since 2020. He also served as the Senior Program Committee member of IJCAI, the Area Chair of ACM MM Asia, and the Guest Editor of IET Image Processing. Hs has published around 120 papers in top-tier journals and conferences, including CVPR, ICCV, NeurIPS, ICML, AAAI, IJCAI, ACL and TPAMI.

This article was made for A*STAR Research by Wildtype Media Group