Skip to content

LangkunLong/Trojan-Trigger-Mask-Optimization

Repository files navigation

Trojan-Trigger-Mask-Optimization

This project is an unofficial PyTorch extension and modernization of the Trojanning Attack on Neural Networks paper.

While the original paper relies on fixed mask shapes (e.g., squares, watermarks) and manual transparency settings, this project implements a fully differentiable trigger generator.

We treat the mask shape, pixel colors, and transparency (alpha) as learnable parameters that are optimized simultaneously via gradient descent to maximize specific neuron activations.

Methodology

Our approach extends the trigger generation phase of TrojanNN by parameterizing the trigger and optimizing it against a specific target neuron in the VGG-Face model.

1. The Learnable Trigger Model

Instead of applying a static patch, we model the "Trojaned" input image $X_{stamped}$ as a differentiable blending operation between the background image and a learned pattern.

The Blending Equation: $$X_{stamped} = (1 - \alpha \cdot M) \odot X_{bg} + (\alpha \cdot M) \odot P$$

Where:

  • $X_{bg}$: The background image (e.g., Mean Gray or Random Noise).
  • $M \in [0, 1]^{H \times W}$: The Learnable Mask. We optimize soft_mask_logit and pass it through a Sigmoid function. This determines the shape of the trigger.
  • $P \in [0, 1]^{C \times H \times W}$: The Learnable Color Pattern. We optimize color_patch_logit (Sigmoid) to learn the pixel values.
  • $\alpha \in [0, 1]$: The Learnable Transparency Scalar. We optimize alpha_blend_logit (Sigmoid) to find the optimal global transparency for the trigger.
  • $\odot$: Element-wise multiplication.

2. Optimization Objective (Loss Function)

We optimize the parameters ${M, P, \alpha}$ using Gradient Descent (minimized as negative loss) to maximize the activation of a target neuron in the fc7 layer, while enforcing sparsity on the mask.

Total Loss: $$L_{totalloss} = \mathcal L_{activationloss} + \lambda_{reg} \cdot \mathcal L_{sparsity}$$

  • Activation Loss ($\mathcal{L}_{activation}$): We use Mean Squared Error to push the neuron activation toward a high target value. (We set a target activation value to 100) $$L_{activation} = 0.5 \cdot (\text{Target} - \text{Activation}(X_{stamped}))^2$$

  • Sparsity Regularization ($\mathcal{L}_{sparsity}$): We apply an L1 penalty to the mask values to encourage the trigger to be small and efficient. $$L_{sparsity} = \sum_{i,j} |M_{i,j}|$$

Results:

Comparison

References

Original Implementation: https://github.com/PurduePAML/TrojanNN

@inproceedings{Trojannn,
  author    = {Yingqi Liu and
               Shiqing Ma and
               Yousra Aafer and
               Wen-Chuan Lee and
               Juan Zhai and
               Weihang Wang and
               Xiangyu Zhang},
  title     = {Trojaning Attack on Neural Networks},
  booktitle = {25th Annual Network and Distributed System Security Symposium, {NDSS}
               2018, San Diego, California, USA, February 18-221, 2018},
  publisher = {The Internet Society},
  year      = {2018},
}

VGG Face model weights extracted from: https://github.com/prlz77/vgg-face.pytorch

@InProceedings{Parkhi15,
  author       = "Parkhi, O. M. and Vedaldi, A. and Zisserman, A.",
  title        = "Deep Face Recognition",
  booktitle    = "British Machine Vision Conference",
  year         = "2015",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published