Adversarial examples designed to fool AI image classification systems have become a hot research and security topic in recent years. Most work on constructing image adversarial examples has involved adding pixel-wise perturbations. Now, researchers from the Chinese University of Hong Kong, University of Michigan, CUHK — SenseTime Joint Lab, Chinese University of Hong Kong and University of Illinois Urbana-Champaign have proposed a new adversarial attack approach, SemanticAdv, which generates adversarial perturbations by manipulating not pixels but rather an image’s semantic attributes. The researchers demonstrate that adding semantic based adversarial examples to an image can mislead even advanced face recognition systems.

In this paper, we aim to explore the impact of semantic manipulation on DNNs predictions by manipulating the semantic attributes of images and generate “unrestricted adversarial examples”. Such semantic based perturbation is more practical compared with pixel level manipulation. In particular, we propose an algorithm SemanticAdv which leverages disentangled semantic factors to generate adversarial perturbation via altering either single or a combination of semantic attributes. We conduct extensive experiments to show that the semantic based adversarial examples can not only fool different learning tasks such as face verification and landmark detection, but also achieve high attack success rate against real-world black-box services such as Azure face verification service. Such structured adversarial examples with controlled semantic manipulation can shed light on further understanding about vulnerabilities of DNNs as well as potential defensive approaches. (arXiv).

Synced invited Sameer Singh, an Assistant Professor of Computer Science at the University of California, Irvine (UCI), who works on robustness and interpretability of machine learning algorithms, to share his thought on SemanticAdv.

How would you describe SemanticAdv:

Recent work has shown that many deep neural networks used for computer vision are brittle; you can adversarially change the pixel values for any image such that the change is unperceivable to humans (the norm of the difference in pixel values is small), however the classifier behaves completely differently on the perturbed image. In this paper, the authors introduce SemanticAdv, an adversarial example for a computer vision classifier created by slightly changing the original image in a semantically meaningful manner. Here the changes are not designed to be unperceivable, in fact overall norm of the change may be high, however a human should be able to easily describe the semantic difference between the two images (e.g. hair color/style has changed, or eyeglasses have been added).

Why does this research matter?

Although traditional adversarial examples have identified very important security concerns in existing machine learning models, the use of Lp norm makes it difficult to understand what semantic changes are important to the classifier. In this paper, by defining adversaries in terms of semantic changes, we can use these examples to identify what natural attributes of the domain are important or unimportant to the classifier. They also make the attacks controllable, i.e. we can decide which attributes we want to change when attacking the classifier, providing avenues of further analysis and understanding of the classifier behavior.

The key reason these kinds of adversarial examples are important is that they reside, more or less, on the same data manifold as the training and test images, making them natural adversaries (Zhao et al, ICLR 2018 and Hendrycks et al, ArXiv). This makes these adversarial examples much more informative and practical in certain situations (e.g. if I add bangs and smile a little more, the face detector will not detect me), as well as more difficult to defend against (since, ideally, these images look exactly like other images the classifier has seen, or is likely to see).

What impact might this work bring to the field?

Approaches to generate such semantic adversaries will be of importance to a number of research communities. For computer vision, it provides more insights into the shortcomings of existing classifiers, which can be informative when building datasets or designing models in the future. This can also be useful for machine learning interpretability and explainable AI, since semantic adversaries describe what is important and not important to the classifier for a given image using intuitive, meaningful descriptions that users will understand. Finally, these additional attacks also question the robustness of machine learning systems and are difficult to defend against, and thus the security community will likely be interested in this topic.

Outside of research, I can imagine this could be a useful pedagogical tool for understanding robustness issues in machine learning. Conventional adversarial attacks, by generating random looking noise, are difficult to understand, whereas semantic adversaries provide intuitive, and often amusing, changes to the original images.

Can you identify any bottlenecks in the research?

There are a number of avenues of future work for this direction.

The paper relies critically on the StarGAN; it is providing the semantic modifications and generating the images. This is a concern because any flaw/shortcoming of the StarGAN model will apply to SemanticAdv as well. But more importantly, it is difficult to apply this approach to a different domain — say ImageNet or even MNIST — as a StarGAN may not be available for them. This severely restricts the potential impact of this work: SemanticAdv are useful only for domains that have an accurate StarGAN.

Currently, any insight that we can glean from the semantic adversary applies to a single image, i.e. the optimization in the paper captures the most important semantic change for that image. However, one of the strengths of these semantic adversaries is that they are intuitive and low dimensional, and thus we can aggregate them over larger collections of images while retaining most of their interpretability. In other words, investigating what universal adversaries would look like with semantic changes can lead to much more useful analysis of the classifier behavior.

Can you predict any potential future developments related to this research?

I foresee a large community forming around such natural and semantic adversaries that consist of changes that appear naturally in datasets, i.e. reside on the dataset manifold. These are often difficult to defend against, making them of interest to the security community, but more importantly, these can be useful in understanding and interpreting the behavior of classifiers, which can lead to advances and development of models that we can deploy to the real-world with much more confidence.

The paper SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing is on arXiv.

About Prof. Sameer Singh

Dr. Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine (UCI). He is working on robustness and interpretability of machine learning algorithms, along with models that reason with text and structure for natural language processing. Sameer was a postdoctoral researcher at the University of Washington and received his PhD from the University of Massachusetts, Amherst, during which he also worked at Microsoft Research, Google Research, and Yahoo! Labs. His group has received funding from Allen Institute for AI, NSF, DARPA, Adobe Research, and FICO. His recent papers on related topics include ICLR 2018, ACL 2018, and NAACL 2019 (see more here).

Synced Insight Partner Program

The Synced Insight Partner Program is an invitation-only program that brings together influential organizations, companies, academic experts and industry leaders to share professional experiences and insights through interviews and public speaking engagements, etc. Synced invites all industry experts, professionals, analysts, and others working in AI technologies and machine learning to participate.

Simply Apply for the Synced Insight Partner Program and let us know about yourself and your focus in AI. We will give you a response once your application is approved.