Can using GANs to do semi-supervised learning lead to models that can identify adversarial examples?
Using the default hyperparameters and training with 100 labelled examples per class, the discriminator network has a 98%
accuracy on the MNIST test set after 300 epochs, and assigns about 85%
probability to the MNIST examples. The network has an error rate of 93%
adversarial examples with the Fast Gradient Sign Method with epsilon = 0.25
. However, it assigns only about 26%
probability to these examples being real.
Replicate by running:
cd mnist_GAN
python mnist_GAN.py
To use tensorboard to look at the images, run:
tensorboard --logdir=checkpoints
Dependencies:
Python 3.6
Tensorflow 1.3.0