Imagine chancing upon a nice blouse while shopping online and wondering what it would look like if it were black instead of blue, or with different buttons on it. In the future, one need not guess how variants of the blouse might look like—a sophisticated computer program could generate images of alternative forms of the blouse in a matter of seconds. The work of Kenan Emir Ak, a Scientist at A*STAR’s Institute for Infocomm Research (I2R), is ushering in such a future.
“Computer-automated image generation has taken a massive leap forward with the development of generative adversarial networks (GANs), allowing us to envision previously impossible tasks, such as automated generation of a fashion image with an arbitrary set of fashion attributes,” Ak explained.
A GAN consists of two neural networks—a generator and a discriminator—competing against each other. While the generator tries to produce realistic samples, the discriminator attempts to distinguish the generated (or faked) samples from real ones. As both networks are trained together, the generator produces increasingly realistic samples over time.
Using large databases of fashion images collected from shopping websites, Ak had a rich source of images to start with. However, his team encountered one important technical hurdle—in addition to looking realistic, the auto-generated images had to accurately reflect specific attributes, such as long sleeves, or a different color. As more and more attributes are specified, conventional GANs struggle to simultaneously manipulate image attributes while determining whether an image is real or fake.
“As such, we added a neural network to detect the regions of interest for a given attribute, such as a person’s arms in ‘sleeveless’ images,” Ak said. “Then, we assigned an additional discriminator network to focus strictly on those regions of the image. With both discriminators in place, the generator is forced to create images that are not only realistic overall, but also have realistic attribute manipulations in specific regions.”
The researchers called their model Attribute Manipulation Generative Adversarial Network, or AMGAN for short. Applying AMGAN to two large image datasets—DeepFashion and Shopping100k—the researchers showed that their technique outperformed other GANs by some 14 percent overall in terms of classification accuracy and realistic manipulation of attributes in fashion images.
Beyond generating, manipulating and classifying images, Ak is optimistic that AMGAN can be used to deal with complex 3D objects and be scaled to different domains other than fashion. Going forward, Ak and his colleagues plan to improve AMGAN’s performance.
“Currently, AMGAN uses concrete attribute values, such as a specific collar design or color patch, as input for image manipulation, which means that achieving a specific ‘look’ might require many separate steps and become very tedious,” he said. “We are exploring the possibility that a simple textual description such as ‘retro-style’ or ‘country fashion’ could be used to aggregate multiple manipulations into a simple, one-step task.”
The A*STAR-affiliated researchers contributing to this research are from the Institute for Infocomm Research (I2R).