Trojan-Trigger-Mask-Optimization

This project is an unofficial PyTorch extension and modernization of the Trojanning Attack on Neural Networks paper.

While the original paper relies on fixed mask shapes (e.g., squares, watermarks) and manual transparency settings, this project implements a fully differentiable trigger generator.

We treat the mask shape, pixel colors, and transparency (alpha) as learnable parameters that are optimized simultaneously via gradient descent to maximize specific neuron activations.

Methodology

Our approach extends the trigger generation phase of TrojanNN by parameterizing the trigger and optimizing it against a specific target neuron in the VGG-Face model.

1. The Learnable Trigger Model

Instead of applying a static patch, we model the "Trojaned" input image $X_{stamped}$ as a differentiable blending operation between the background image and a learned pattern.

The Blending Equation: $$X_{stamped} = (1 - \alpha \cdot M) \odot X_{bg} + (\alpha \cdot M) \odot P$$

Where:

$X_{bg}$: The background image (e.g., Mean Gray or Random Noise).
$M \in [0, 1]^{H \times W}$: The Learnable Mask. We optimize soft_mask_logit and pass it through a Sigmoid function. This determines the shape of the trigger.
$P \in [0, 1]^{C \times H \times W}$: The Learnable Color Pattern. We optimize color_patch_logit (Sigmoid) to learn the pixel values.
$\alpha \in [0, 1]$: The Learnable Transparency Scalar. We optimize alpha_blend_logit (Sigmoid) to find the optimal global transparency for the trigger.
$\odot$: Element-wise multiplication.

2. Optimization Objective (Loss Function)

We optimize the parameters ${M, P, \alpha}$ using Gradient Descent (minimized as negative loss) to maximize the activation of a target neuron in the fc7 layer, while enforcing sparsity on the mask.

Total Loss: $$L_{totalloss} = \mathcal L_{activationloss} + \lambda_{reg} \cdot \mathcal L_{sparsity}$$

Activation Loss ($\mathcal{L}_{activation}$): We use Mean Squared Error to push the neuron activation toward a high target value. (We set a target activation value to 100) $$L_{activation} = 0.5 \cdot (\text{Target} - \text{Activation}(X_{stamped}))^2$$
Sparsity Regularization ($\mathcal{L}_{sparsity}$): We apply an L1 penalty to the mask values to encourage the trigger to be small and efficient. $$L_{sparsity} = \sum_{i,j} |M_{i,j}|$$

Results:

References

Original Implementation: https://github.com/PurduePAML/TrojanNN

@inproceedings{Trojannn,
  author    = {Yingqi Liu and
               Shiqing Ma and
               Yousra Aafer and
               Wen-Chuan Lee and
               Juan Zhai and
               Weihang Wang and
               Xiangyu Zhang},
  title     = {Trojaning Attack on Neural Networks},
  booktitle = {25th Annual Network and Distributed System Security Symposium, {NDSS}
               2018, San Diego, California, USA, February 18-221, 2018},
  publisher = {The Internet Society},
  year      = {2018},
}

VGG Face model weights extracted from: https://github.com/prlz77/vgg-face.pytorch

@InProceedings{Parkhi15,
  author       = "Parkhi, O. M. and Vedaldi, A. and Zisserman, A.",
  title        = "Deep Face Recognition",
  booktitle    = "British Machine Vision Conference",
  year         = "2015",
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Evaluation diagram.pdf		Evaluation diagram.pdf
README.md		README.md
TrojanNN_extension_project_shared_version.ipynb		TrojanNN_extension_project_shared_version.ipynb
trojan_extension_training_code.py		trojan_extension_training_code.py
visual.jpg		visual.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Trojan-Trigger-Mask-Optimization

Methodology

1. The Learnable Trigger Model

2. Optimization Objective (Loss Function)

Results:

References

About

Uh oh!

Releases

Packages

Languages

LangkunLong/Trojan-Trigger-Mask-Optimization

Folders and files

Latest commit

History

Repository files navigation

Trojan-Trigger-Mask-Optimization

Methodology

1. The Learnable Trigger Model

2. Optimization Objective (Loss Function)

Results:

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages