Highlights

In brief

A*STAR researchers build deep learning models that pick up view-specific and instance-specific features to enhance 3D object retrieval methods.

Algorithms give computers stereoscopic vision

25 Sep 2023

A computational breakthrough allows computers to better identify three dimensional objects by recognising relevant image features.

We experience the world the way we do because the left and right eyes capture images from slightly different angles before the brain seamlessly merges them into a single, three-dimensional image. Computer vision experts have struggled to equip machines with a similar capability: how do you teach a computer to ‘see’ in 3D?

3D object retrieval (searching and retrieving 3D objects from large databases based on their similarity to a given query object) is critical for powering applications such as 3D printing, autonomous driving, augmented reality and industrial product design.

Unlike conventional 2D reverse image searches, 3D object retrieval requires an accurate interpretation of the shape and structure of the object from different perspectives. Among the current approaches for 3D object retrieval, view-based methods are favoured because they are more flexible and computationally efficient, but they can sometimes miss details that are specific to each individual view, such as fine-grained object parts or local variations.

Research Scientist Dongyun Lin from A*STAR’s Institute for Infocomm Research (I²R) explained that adding self-attention modules can give view-based methods a much-needed boost. “Self-attention modules can be really beneficial for tasks where different parts/subregions of the input are more or less important for making accurate predictions,” Lin said.

Lin gives the example of a computer vision platform given the task of identifying 3D objects within a complicated image. “Self-attention modules can help the model ‘pay attention’ to certain regions of the image that are most relevant for identifying the object.”

Together with their collaborators from Safran Landing System, Lin and colleagues built two custom self-attention modules, the View Attention Module (VAM) and the Instance Attention Module (IAM). VAM identifies features in a specific view of an object, while the IAM identifies relevant features that are present across all views. By running both modules in parallel and implementing a novel combinatory loss function on the extracted features, the team proposed a new-and-improved workflow for accurate 3D object retrieval.

Lin’s team compared their proposed method to other view-based benchmarks using a variety of everyday images from public datasets. They found that the VAM and IAM duo was not only more efficient, but also more consistent compared to existing methods. This development has the potential to springboard applications that rely on 3D object retrieval, such as computer-aided design (CAD).

Speaking on plans to further develop their platform, Lin said: “We plan to incorporate the VAM and IAM into sequence models like LSTM or Vision Transformer to improve the aggregation performance of multi-view data for better retrieval performance.”

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I²R).

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

computer vision A*STAR Institute for Infocomm Research (A*STAR I²R) Smart Nation and Digital Economy (SNDE) multiview self-attention modules 3D object retrieval convolutional neural network

References

Lin, D., Li, Y., Cheng, Y., Prasad, S., Nwe, T.L., et al. Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features. Knowledge-Based Systems 247, 108754 (2022). | article

About the Researcher

View articles

Dongyun Lin

Research Scientist

Institute for Infocomm Research (I2R)

View articles

Dongyun Lin received his Bachelor of Engineering in Information and Electronic Engineering from the Beijing Institute of Technology, China in 2013. He was awarded a PhD in Electrical & Electronic Engineering at Nanyang Technological University, Singapore in 2019. Lin worked as a Scientist at the Visual Intelligence Department, at A*STAR’s Institute for Infocomm Research (I2R). In 2022, he was promoted to further his research on Computer Vision, Deep Learning, Statistical Machine Learning and Biomedical Image Analysis. Lin, an IEEE member, has published extensively on topics such as robust machine learning, visual defect detection/segmentation, 3D object recognition/retrieval and few-shot learning.

Highlights

Algorithms give computers stereoscopic vision

Want to stay up to date with breakthroughs from A*STAR? Follow us on Twitter and LinkedIn!

References

About the Researcher

Dongyun Lin

This article was made for A*STAR Research by Wildtype Media Group

Related Articles

Turning noise into an AI asset

Reality mirrors

A sound test for AI listeners

Get the PDF deliveredto your inbox.

Get the PDF deliveredto your inbox.

Join our mailing list

Get the PDF delivered
to your inbox.

Get the PDF delivered
to your inbox.