Skip to content

[EMNLP 2025] Official code repository of paper titled "TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models" accepted in EMNLP 2025 conference.

License

Notifications You must be signed in to change notification settings

asif-hanif/trojanwave

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models (EMNLP'25)

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models

Asif Hanif, Maha Tufail Agro, Fahad Shamshad, and Karthik Nandakumar

page paper


main figure
TrojanWave

This attack learns two triggers (temporal and spectral) to embed a backdoor into the audio-language model (ALM) during prompt learning. The ALM’s weights remain frozen, and only the learnable prompts are manipulated. At inference time, the ALM performs normally on clean inputs (performance on par with the backdoor-free setup) but predicts the adversary’s target label $y^{\prime}$ when input containing trigger is presented.


main figure
TrojanWave Attack Pipeline

An adversary embeds a backdoor into the learned prompts during few-shot training and publishes the infected prompts online. An unsuspecting user who adopts these prompts for their model unknowingly inherits the backdoor, resulting in normal performance on clean inputs but adversary-desired targeted misclassification when triggered inputs are encountered.




Abstract

Prompt learning has emerged as an efficient alternative to full fine-tuning for adapting large audio-language models (ALMs) to downstream tasks. While this paradigm enables scalable deployment via Prompt-as-a-Service frameworks, it also introduces a critical yet underexplored security risk of backdoor attacks. In this work, we present TrojanWave, the first backdoor attack tailored to the prompt-learning setting in frozen ALMs. Unlike prior audio backdoor methods that require training from scratch on full datasets, TrojanWave injects backdoors solely through learnable prompts, making it highly scalable and effective in few-shot settings. TrojanWave injects imperceptible audio triggers in both time and spectral domains to effectively induce targeted misclassification during inference. To mitigate this threat, we further propose TrojanWave-Defense, a lightweight prompt purification method that neutralizes malicious prompts without hampering the clean performance. Extensive experiments across 11 diverse audio classification benchmarks demonstrate the robustness and practicality of both the attack and defense.

TLDR: The paper presents TrojanWave, a novel backdoor attack on audio-language models that exploits prompt learning instead of model retraining. It exposes the security risks of malicious prompts that inject imperceptible audio triggers causing hidden misclassifications.A lightweight defense, TrojanWave-Defense, is proposed to purify infected prompts while preserving normal model performance.



Goal: The paper aims to introduce and analyze a new type of backdoor attack—called TrojanWave—targeting large audio-language models (ALMs) that use prompt learning. Its main objective is to show that such attacks can be executed solely through learnable prompts, without modifying model parameters, making them highly stealthy and scalable. It also proposes a defense method, TrojanWave-Defense, to purify infected prompts and restore model safety without degrading normal performance.

Motivation: With the rise of prompt learning and “Prompt-as-a-Service” frameworks, users increasingly rely on third-party prompts to adapt models efficiently. However, this creates a serious security risk where adversaries can distribute malicious prompts that appear normal but contain hidden backdoors triggered by imperceptible sounds. Recognizing that such prompt-based attacks were largely unexplored, the paper seeks to expose this vulnerability and highlight the urgent need for protection in real-world ALM deployments.

Main Idea: TrojanWave introduces a stealthy attack that embeds imperceptible audio triggers—crafted in both time and spectral domains—into learnable prompts, which then cause targeted misclassification when triggered inputs are encountered. Unlike prior attacks that retrain models, it keeps the backbone model frozen, making the method lightweight and efficient. To counteract this, the authors propose TrojanWave-Defense, a prompt purification strategy that removes the correlation between malicious prompts and triggers while maintaining clean-task accuracy.



Updates 🚀

  • Aug 20, 2025 : Accepted in EMNLP (Main) 2025    🎊 🎉
  • Nov 05, 2025 : Released code for TrojanWave-Attack
  • Nov 05, 2025 : Released instructions for preparing datasets
  • Nov 10, 2025 : Released code for TrojanWave-Defense


Table of Contents



  1. Create a conda environment
conda create --name trojanwave python=3.8
conda activate trojanwave
  1. Install PyTorch and other dependencies
git clone https://github.com/asif-hanif/trojanwave
cd trojanwave
pip install -r requirements.txt

We have shown the results on TrojanWave and other baselines (NBA, NBA-D, FlowMur) using PENGI model.

Download the pre-trained PENGI model using the link provided below and place the checkpoint file at path pengi/configs (after clonning the repo).

Model Link Size
PENGI Download 2.2 GB

PENGI checkpoint can also be downloaded with following command:

wget https://zenodo.org/records/8387083/files/base.pth

We have performed experiments on 11 audio classification datasets. Instructions for downloading/processing datasets used by our method have been provided in the DATASETS.md.

Dataset Type Classes Size Link
Beijing-Opera Instrument Classification 4 69 MB Instructions
CREMA-D Emotion Recognition 6 606 MB Instructions
ESC50 Sound Event Classification 50 881 MB Instructions
ESC50-Actions Sound Event Classification 10 881 MB Instructions
GT-Music-Genre Music Analysis 10 1.3 GB Instructions
NS-Instruments Instrument Classification 10 18.5 GB Instructions
RAVDESS Emotion Recognition 8 1.1 GB Instructions
SESA Surveillance Sound Classification 4 70 MB Instructions
TUT2017 Acoustic Scene Classification 15 12.3 GB Instructions
UrbanSound8K Sound Event Classification 10 6.8 GB Instructions
VocalSound Vocal Sound Classification 6 8.2 GB Instructions


All datasets should be placed in a directory named Audio-Datasets and the path of this directory should be specified in the variable DATASET_ROOT in the shell scripts. The directory structure should be as follows:

Audio-Datasets/
    ├── Beijing-Opera/
    ├── CREMA-D/
    ├── ESC50/ 
    ├── ESC50-Actions/
    ├── GT-Music-Genre/
    ├── NS-Instruments/
    ├── RAVDESS/
    ├── SESA/
    ├── TUT2017/
    ├── UrbanSound8K/
    ├── VocalSound/

There are three main folders in this repo: pengi, methods, utils. Code in pengi folder is taken from PENGI repo for model instantiation. Implementation of baselines (nba, nbad, flowmur) and our method trojanwave is in methods folder. Class definitions of audio and text encoder of PENGI model can be found in methods/encoders.py file. Training and dataset related code is in utils folder.


We have performed all experiments on NVIDIA A100-SXM4-40GB GPU. Shell scripts to run experiments can be found in scripts folder.

## General Command Structure
bash  <SHELL_SCRIPT>  <METHOD_NAME> <ATTACK_NAME>

Following methods (including trojanwave) are supported in this repository:

noattack nba nbad flowmur

Examples to run trojanwave method on different audio classifiction datasets have been provided below:

bash scripts/beijing_opera.sh palm trojanwave
bash scripts/crema_d.sh palm trojanwave
bash scripts/esc50_actions.sh palm trojanwave
bash scripts/esc50.sh palm trojanwave
bash scripts/gt_music_genre.sh palm trojanwave
bash scripts/ns_instruments.sh palm trojanwave
bash scripts/ravdess.sh palm trojanwave
bash scripts/sesa.sh palm trojanwave
bash scripts/tut.sh palm trojanwave
bash scripts/urban_sound.sh palm trojanwave
bash scripts/vocal_sound.sh palm trojanwave

Results are saved in json format in logs directory. To process results, run the following command (after running all experiments):

cd logs
bash results.sh
Sample Output

main figure



Note To simplify evaluation, we convert multi-fold datasets into a single train–test split rather than performing cross-validation.


Run Defense Experiments ⚡

To run TrojanWave-Defense experiments, please follow instructions in the README file of torjanwave-defense branch of this repo.



main figure



If you find our work, this repository, or pretrained models useful, please consider giving a star ⭐ and citation.

@inproceedings{hanif2025trojanwave,
  title={TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models},
  author={Hanif, Asif and Agro, Maha Tufail and Shamshad, Fahad and Nandakumar, Karthik},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={18628--18644},
  year={2025}
}

Should you have any questions, please create an issue on this repository or contact us at asif.hanif@mbzuai.ac.ae


We used PENGI for model instantiation and borrowed a part of code from NBA, NBA-D and FlowMur to implement baselines. We thank the respective authors for releasing the code.


About

[EMNLP 2025] Official code repository of paper titled "TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models" accepted in EMNLP 2025 conference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published