Highlights

Above

Mimicking human search techniques improves the speed with which a computer can spot Waldo in a crowd.

Helping computers find ‘Waldo’

21 Feb 2019

New program picks out targets in a crowd quickly and efficiently

It can be harder for computers to find ‘Waldo’, an elusive character that hides within crowds in a popular children’s book series, than it is for humans. Now, an A*STAR researcher and her colleagues have developed a biologically-inspired program that could enable computers to identify real-life ‘Waldos’ and other targets more efficiently.

Computer image analysis is routinely used in medicine, security, and rescue. Speed is often critical in these efforts, says Mengmi Zhang, a computer scientist at A*STAR’s Institute for Infocomm Research, who led the study. She cites the use of computers to help find victims of natural disasters, such as earthquakes.

But these efforts are often hampered because computers lack human intuition. A person can quickly spot a dog in a crowded space, for instance, even if they have never seen that particular dog before. A computer, by contrast, needs to be trained using thousands of images of different dogs, and even then, they can falter when looking for a new dog whose image they have not encountered previously.

This weakness could be particularly problematic when scanning for weapons, says Zhang. A computer trained to look for knives and guns, might overlook another sharp object. “If there is one sharp metal stick which has not been seen in the training set, it doesn’t mean the passenger should be able to take it on board the airplane,” says Zhang.

Current computer searches also tend to be slow because the computer must scan every part of an image in sequence, paying equal attention to each part. Humans, however, rapidly shift their attention between several different locations in an image to find their target. Zhang and her colleagues’ wanted to understand how humans do this so efficiently. They presented 45 people with crowded images and asked them to hunt for a target, say, a sheep. They monitored how the subjects’ eyes darted around the scene, fixating briefly on different locations in the image. They found that, on average, people could locate the sheep in around 640 milliseconds. This corresponded to switching the location of their gaze, on average, just over two and a half times.

The team then developed a computer model to implement this more human-like search strategy in the hunt for a dog. Rather than looking for a target that was identical to an image of a dog given beforehand, the model was trained to look for something that had similar features to the example image. This enabled the model to generalize from a single dog image, to the “general concept of a dog,” and quickly pick out other dogs it had not seen before, explains Zhang.

The researchers tested how effective the new computer visual search model was by measuring the number of times the computer had to fixate on different locations in a scene before finding its target. “What surprises us is that by using our method, computers can search images as fast as humans, even when searching for objects they’ve never seen before,” says Zhang. The computer was even as good as humans at finding Waldo.

The team is now programming their model with a better understanding of context. For example, humans naturally understand that a cup is more likely to be sitting on a table than floating in the air. Once implemented, this should improve the model’s efficiency even further, says Zhang, adding, “Waldo cannot hide anymore.”

The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research. For more information about the team’s research, please visit the Image and Video Analytics webpage.

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!

visual search A*STAR Institute for Infocomm Research (A*STAR I²R)

References

Zhang, M., Feng, J., Ma K. T., Lim, J. H., Zhao, Q. et al. Finding any Waldo with zero-shot invariant and efficient visual search, Nature Communications 9, 3730 (2019). | article

About the Researcher

View articles

Mengmi Zhang

PhD Student

Institute for Infocomm Research

View articles

Mengmi Zhang received BEng first class honours in Electrical and Computer Engineering (ECE) from the National University of Singapore (NUS) in 2015. She studied at the University of California, Santa Barbara as an exchange student in 2014. She is currently working toward a PhD at the Graduate School for Integrative Sciences and Engineering, National University of Singapore. She is also affiliated with multiple research institutions including: Institute for Infocomm Research, A*STAR, Singapore; Boston Children's Hospital and Harvard Medical School, USA; and Center for Brains, Minds and Machines, MIT, USA. Her research interests include computer vision, machine learning, and cognitive neuroscience.

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature

RIE2030: Turning the page

14 May 2026

As a new five-year phase of Research, Innovation and Enterprise takes off across the nation, A*STAR leaders present the strategic throughlines and shifts through which the agency will advance national priorities in health, economy, sustainability and future technologies.

Highlights

Helping computers find ‘Waldo’

Want to stay up-to-date with A*STAR’s breakthroughs? Follow us on Twitter and LinkedIn!

References

About the Researcher

Mengmi Zhang

This article was made for A*STAR Research by Nature Research Custom Media, part of Springer Nature

Related Articles

RIE2030: Turning the page

Tuning AI to local news beats

Encrypted but not invisible

Get the PDF deliveredto your inbox.

Get the PDF deliveredto your inbox.

Join our mailing list

Get the PDF delivered
to your inbox.

Get the PDF delivered
to your inbox.