This project is an unofficial PyTorch extension and modernization of the Trojanning Attack on Neural Networks paper.
While the original paper relies on fixed mask shapes (e.g., squares, watermarks) and manual transparency settings, this project implements a fully differentiable trigger generator.
We treat the mask shape, pixel colors, and transparency (alpha) as learnable parameters that are optimized simultaneously via gradient descent to maximize specific neuron activations.
Our approach extends the trigger generation phase of TrojanNN by parameterizing the trigger and optimizing it against a specific target neuron in the VGG-Face model.
Instead of applying a static patch, we model the "Trojaned" input image
The Blending Equation:
Where:
-
$X_{bg}$ : The background image (e.g., Mean Gray or Random Noise). -
$M \in [0, 1]^{H \times W}$ : The Learnable Mask. We optimizesoft_mask_logitand pass it through a Sigmoid function. This determines the shape of the trigger. -
$P \in [0, 1]^{C \times H \times W}$ : The Learnable Color Pattern. We optimizecolor_patch_logit(Sigmoid) to learn the pixel values. -
$\alpha \in [0, 1]$ : The Learnable Transparency Scalar. We optimizealpha_blend_logit(Sigmoid) to find the optimal global transparency for the trigger. -
$\odot$ : Element-wise multiplication.
We optimize the parameters fc7 layer, while enforcing sparsity on the mask.
Total Loss:
-
Activation Loss (
$\mathcal{L}_{activation}$ ): We use Mean Squared Error to push the neuron activation toward a high target value. (We set a target activation value to 100)$$L_{activation} = 0.5 \cdot (\text{Target} - \text{Activation}(X_{stamped}))^2$$ -
Sparsity Regularization (
$\mathcal{L}_{sparsity}$ ): We apply an L1 penalty to the mask values to encourage the trigger to be small and efficient.$$L_{sparsity} = \sum_{i,j} |M_{i,j}|$$
Original Implementation: https://github.com/PurduePAML/TrojanNN
@inproceedings{Trojannn,
author = {Yingqi Liu and
Shiqing Ma and
Yousra Aafer and
Wen-Chuan Lee and
Juan Zhai and
Weihang Wang and
Xiangyu Zhang},
title = {Trojaning Attack on Neural Networks},
booktitle = {25th Annual Network and Distributed System Security Symposium, {NDSS}
2018, San Diego, California, USA, February 18-221, 2018},
publisher = {The Internet Society},
year = {2018},
}
VGG Face model weights extracted from: https://github.com/prlz77/vgg-face.pytorch
@InProceedings{Parkhi15,
author = "Parkhi, O. M. and Vedaldi, A. and Zisserman, A.",
title = "Deep Face Recognition",
booktitle = "British Machine Vision Conference",
year = "2015",
}
