Chengzhi Mao* · Lingyu Zhang · Abhishek Joshi · Junfeng Yang · Hao Wang · Carl Vondrick
https://arxiv.org/pdf/2212.06079.pdf
Deep networks for computer vision are not reliable when they encounter adversarial examples. In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference. By introducing constraints at inference time, we can shift the burden of robustness from training to the inference algorithm, thereby allowing the model to adjust dynamically to each individual image's unique and potentially novel characteristics at inference time. Among different constraints, we find that equivariance-based constraints are most effective, because they allow dense constraints in the feature space without overly constraining the representation at a fine-grained level. Our theoretical results validate the importance of having such dense constraints at inference time. Our empirical experiments show that restoring feature equivariance at inference time defends against worst-case adversarial perturbations. The method obtains improved adversarial robustness on four datasets (ImageNet, Cityscapes, PASCAL VOC, and MS-COCO) on image recognition, semantic segmentation, and instance segmentation tasks.
We use anaconda to manage the environment. The configuration file is: conda env create -f environment_equi.yml
Download the adversarial pretrained cityscapes checkpoint here .
Download the vanilla pretrained cityscapes checkpoint here .
Run test-time robustness:
CUDA_VISIBLE_DEVICES=0,1 python equi4robust.py