-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add [Stochastic LWTA] #58
Comments
Hi, thanks for the submission! I noticed that your model has some stochastic component, but even using EoT the robust accuracy didn't decrease. Then I set clean accuracy: 73.70% for 1000 points. Then I evaluated the obtained points with the original (stochastic) model and got clean accuracy: 85.80% with 6 runs to check the standard deviation. Also, consider that the points misclassified by the deterministic model were not attacked, which means around 12% of the robust accuracy is explained by the difference in clean performance between deterministic and stochastic model. Then attacking also such points might further reduce the robust accuracy. Since the stochastic component doesn't impact significantly the adversarial perturbations, I guess there might be some component in the stochastic activation functions which prevents the usual gradient computation (one'd need to inspect the gradients carefully). |
Hello, I do not understand why you followed such an approach in evaluation. |
Sorry, maybe I didn't explain myself clearly. I used the deterministic version only to generate the adversarial perturbations (first set of results), then classified the adversarial points with the original model with stochastic activation functions. This is a standard transfer attack (I invite you to check if you obtain the same results with such method), which can be helpful when the information (especially about gradients) from the target model is for some reason not helpful. |
In the considered approach, we are not investigating the context of a transfer attack or other potential attacks using surrogate models. The LWTA approaches are scarcely examined in the literature and a thorough examination is in progress. However, in this case, we focus on the AutoAttack and other gradient-based implementations. We are talking specifically about the A-PGD and the other included attacks (from apgd-ce, apgd-t, fab-t, apgd-dlr, square and the rand versions using EoT). |
A few comments: first, the leaderboard on the webpage includes only deterministic defenses (note that in the paper we have a separate table for randomized ones which are not included). Second, there are other ways to make white-box attacks, including those in AutoAttack, to (mostly) fail or perform poorly like quantization of the input (#44) or adding a further softmax layer on top of the classifier (#41). However, such methods do not improve robustness, and are easily bypassed by careful implementation changes (while preserving the weights of the target model). Changing the activation function to generate the perturbations (via AA) seems to me another of those countermeasures one can adopt when the information coming from the target model is little or not helpful. Also, in the white-box scenario, one is interested in robustness of the models to any attack, including transfer-based ones. |
According to your comments, it seems that AutoAttack/RobustBench is ready to exclude all Bayesian-based methods. Would you consider a VAE-like model as a leaderboard candidate? Your examples are at least irrelevant, if not unfortunate. The described examples are ad-hoc "tricks" to fool the library or honest mistakes and not principled design paradigms. Since these approaches are easily bypassed, if AutoAttack fails to capture the properties of newly proposed methods, one should focus on improving the attack and not arguing about other approaches that "seem" to be a countermeasure. I can at least be certain that every entry in the leaderboard has been thoroughly cross-examined with surrogate models and that the produced results are not artifacts of the training processes and tweaks and are truly robust. |
RobustBench clearly excludes randomized defenses from the current leaderboards, and accepts adaptive evaluations to improve the AA one. If a leaderboard for randomized defenses will be added, we'll be happy to accept your models. For the models in Model Zoo, we have also studied transferability (see our paper) if that's what you mean. Obviously, I cannot guarantee that no attack is able to improve the robustness evaluation (that's for certified defenses), but if they had shown suspicious behaviors we would have further tested them. I think it'd be indeed great to have an attack which automatically detects when standard methods do not work and finds an alternative. Currently, for some cases, we still have to it manually. But, back to the point, I'd like to know if you can confirm that the model is vulnerable to transfer attacks, or I missed something when evaluating it. |
As a lead PI of this work, I have watched the thread and remain completely disappointed with the depth of the technical analysis and arguments. Here, we are talking about having a network learn some representations, at some layers inside it, which are latent variables, that is sampled variables, not simple values from a ReLU or other silly function. We draw from a TRAINED posterior which is there for the attacker to learn if they can. A good attack method should be able to catch this posterior, because it's learned there, it is NOT something COMPLETELY RANDOM, used as a trick. You LEARN POSTERIORS IN A VARIATIONAL WAY AND THEN SAMPLE DURING INFERENCE (DURING TRAINING YOU DO THE REPARAMETERIZATION TRICK/GUMBEL-SOFTMAX OR RELAXED CATEGORICAL). If we were novice to the field, we could have got the encoder of a variational autoenocoder, get the representations "z's", which are sampled Gaussians, not deterministic units there, and then feed them to further layers with ReLUs and so on. Would you consider the existence of a Gaussian activation layer at any point of that net a randomised defence?! How can this even be said? It's a better, a more ROBUST way to learn representations what happens with the Gaussian unit in the VAE-type model. Thus, that's a kind of activation we are talking about, NOT a randomised defence. How can you even say a neuron which functions as a random variable (i.e I train a posterior distribution which is there and can be interrogated, as opposed to point-estimates), that constitutes a RANDOMIZED DEFENCE TRICK? Where is the trick here? We are talking about the wholly grail of representation learning. Of course, there is a whole theory of ours as to why Gaussian layers or not good, while sparse outputs from Discrete distributions (this is what the blocks of stochastic LWTAs are), which are biologically-inspired (this is how cortex makes representations, its stochastic competition to fire), are the way to go. They are a promising method for learning representations, in so many contexts that you will see coming in the near future as publications in major venues from us (the here-discussed paper is an AISTATS, of course, and the work has also been published in ICLM 2019I). All in all, this discussion is disheartening as it shows fractions of the community lack basic knowledge of anything else than writing networks with few lines of Tensorflow. They are unfamiliar with even Tensorflow-Distribution, which contains the Gaussian layer as yet another one line command, and of course Gumbel-softmax relaxation. |
I just reported the results I got, without questioning or mentioning the idea behind your model, and asked (twice) to confirm whether the model is vulnerable or not to transfer attacks. |
I think Francesco asked a simple question - either your model is robust or it is not. We care only about the correct evaluation of the adversarial robustness and not about the motivation behind your model. |
Dear Dr. Hein, first of all, to answer the posed question, we evaluated the model using a WideResNet-34-5. Similarly to Francesco's process (for which I asked for the implementation and got no answer), we produced the adversarial examples with a deterministic LWTA activation (for all 10000 examples), yielding: initial accuracy: 21.06% and then classified the adversarial examples using the stochastic model resulting to 14.23% error (85.77% robust accuracy). The same model under an immediate A-PGD-CE attack yielded 87% robust accuracy. As already mentioned in a previous post, we are currently investigating further aspects of the proposed architecture, including black box attacks (that have nothing to do with gradient information and among which is Square, where our approach once again yields SOTA performance). Nevertheless, it is apparent that in this case, the surrogate model fails to produce meaningful adversarial examples for a transfer attack. Any useful feedback, constructive criticism and suggestions (or even experimental results) are always welcomed. Apart from that, guesses, ad-hoc claims and irony are not a part of the scientific rationale and it is simply bad etiquette in the community (basically in all communities and life in general). In this context, while you claim that Francesco just asked a "simple question", this is not the case. If he just stated from the beginning that the AutoAttack leaderboard does not accept randomized defences (even if in many cases this is an ambiguous term, and even if it is not stated in the AutoAttack page but only in RobustBench, and even if the considered approach fell under this category) and asked for further experimental results or dismissed the entry on that basis, we wouldn't be having this conversation. When F. compares our paradigm to mistakes, silly approaches or even cunning tricks to elude the adversarial optimization process, it is not just a "simple question", but instead a direct "attack" on our method. A public critique of a proposed approach is either the work of a reviewing process or the result of a thorough investigation. Othewrise it's just tittle-tattle. |
Paper: Local Competition and Stochasticity for Adversarial Robustness in Deep Learning (http://proceedings.mlr.press/v130/panousis21a)
Venue: International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
Dataset and threat model: CIFAR-10, L-inf, 8/255
Code: https://github.com/konpanousis/Adversarial-LWTA-AutoAttack
Pre-trained model: https://drive.google.com/file/d/15gTO0_HJzRi6toYEtlwA96Hwe49flmWA/view?usp=sharing
Log file: https://github.com/konpanousis/Adversarial-LWTA-AutoAttack/blob/main/log.txt
Additional data: No
Clean and robust accuracy: 90.89 and 87.5
Architecture: {WideResNet-34-5 with Stochastic LWTA Activations}
Description of the model/defense: {This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units result
in sparse representations from each model layer, as the units are organized in blocks where only one unit generates a non-zero output. The main operating principle of the introduced units lies on stochastic arguments, as the network performs posterior sampling over competing units to select the winner. We combine these LWTA arguments with tools from the field of Bayesian non-parametrics, specifically the stick-breaking construction of the Indian Buffet Process, to allow for inferring
the sub-part of each layer that is essential for modeling the data at hand. Then, inference is
performed by means of stochastic variational Bayes. We perform a thorough experimental evaluation of our model using benchmark datasets. As we show, our method achieves high robustness to adversarial perturbations,
with state-of-the-art performance in powerful adversarial attack schemes.}
The text was updated successfully, but these errors were encountered: