TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models (EMNLP'25)

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models

Asif Hanif, Maha Tufail Agro, Fahad Shamshad, and Karthik Nandakumar

TrojanWave

This attack learns two triggers (temporal and spectral) to embed a backdoor into the audio-language model (ALM) during prompt learning. The ALM’s weights remain frozen, and only the learnable prompts are manipulated. At inference time, the ALM performs normally on clean inputs (performance on par with the backdoor-free setup) but predicts the adversary’s target label $y^{\prime}$ when input containing trigger is presented.


TrojanWave Attack Pipeline An adversary embeds a backdoor into the learned prompts during few-shot training and publishes the infected prompts online. An unsuspecting user who adopts these prompts for their model unknowingly inherits the backdoor, resulting in normal performance on clean inputs but adversary-desired targeted misclassification when triggered inputs are encountered.

Abstract
Prompt learning has emerged as an efficient alternative to full fine-tuning for adapting large audio-language models (ALMs) to downstream tasks. While this paradigm enables scalable deployment via Prompt-as-a-Service frameworks, it also introduces a critical yet underexplored security risk of backdoor attacks. In this work, we present TrojanWave, the first backdoor attack tailored to the prompt-learning setting in frozen ALMs. Unlike prior audio backdoor methods that require training from scratch on full datasets, TrojanWave injects backdoors solely through learnable prompts, making it highly scalable and effective in few-shot settings. TrojanWave injects imperceptible audio triggers in both time and spectral domains to effectively induce targeted misclassification during inference. To mitigate this threat, we further propose TrojanWave-Defense, a lightweight prompt purification method that neutralizes malicious prompts without hampering the clean performance. Extensive experiments across 11 diverse audio classification benchmarks demonstrate the robustness and practicality of both the attack and defense.

TLDR: The paper presents TrojanWave, a novel backdoor attack on audio-language models that exploits prompt learning instead of model retraining. It exposes the security risks of malicious prompts that inject imperceptible audio triggers causing hidden misclassifications.A lightweight defense, TrojanWave-Defense, is proposed to purify infected prompts while preserving normal model performance.

Goal: The paper aims to introduce and analyze a new type of backdoor attack—called TrojanWave—targeting large audio-language models (ALMs) that use prompt learning. Its main objective is to show that such attacks can be executed solely through learnable prompts, without modifying model parameters, making them highly stealthy and scalable. It also proposes a defense method, TrojanWave-Defense, to purify infected prompts and restore model safety without degrading normal performance.

Motivation: With the rise of prompt learning and “Prompt-as-a-Service” frameworks, users increasingly rely on third-party prompts to adapt models efficiently. However, this creates a serious security risk where adversaries can distribute malicious prompts that appear normal but contain hidden backdoors triggered by imperceptible sounds. Recognizing that such prompt-based attacks were largely unexplored, the paper seeks to expose this vulnerability and highlight the urgent need for protection in real-world ALM deployments.

Main Idea: TrojanWave introduces a stealthy attack that embeds imperceptible audio triggers—crafted in both time and spectral domains—into learnable prompts, which then cause targeted misclassification when triggered inputs are encountered. Unlike prior attacks that retrain models, it keeps the backbone model frozen, making the method lightweight and efficient. To counteract this, the authors propose TrojanWave-Defense, a prompt purification strategy that removes the correlation between malicious prompts and triggers while maintaining clean-task accuracy.

Updates 🚀

Aug 20, 2025 : Accepted in EMNLP (Main) 2025 🎊 🎉
Nov 05, 2025 : Released code for TrojanWave-Attack
Nov 05, 2025 : Released instructions for preparing datasets
Nov 10, 2025 : Released code for TrojanWave-Defense

Installation ⚙️

Create a conda environment

conda create --name trojanwave python=3.8
conda activate trojanwave

Install PyTorch and other dependencies

git clone https://github.com/asif-hanif/trojanwave
cd trojanwave
pip install -r requirements.txt

Model 🔳

We have shown the results on TrojanWave and other baselines (NBA, NBA-D, FlowMur) using PENGI model.

Download the pre-trained PENGI model using the link provided below and place the checkpoint file at path pengi/configs (after clonning the repo).

Model	Link	Size
PENGI	Download	2.2 GB

PENGI checkpoint can also be downloaded with following command:

wget https://zenodo.org/records/8387083/files/base.pth

Datasets 📃

We have performed experiments on 11 audio classification datasets. Instructions for downloading/processing datasets used by our method have been provided in the DATASETS.md.

Dataset	Type	Classes	Size	Link
Beijing-Opera	Instrument Classification	4	69 MB	Instructions
CREMA-D	Emotion Recognition	6	606 MB	Instructions
ESC50	Sound Event Classification	50	881 MB	Instructions
ESC50-Actions	Sound Event Classification	10	881 MB	Instructions
GT-Music-Genre	Music Analysis	10	1.3 GB	Instructions
NS-Instruments	Instrument Classification	10	18.5 GB	Instructions
RAVDESS	Emotion Recognition	8	1.1 GB	Instructions
SESA	Surveillance Sound Classification	4	70 MB	Instructions
TUT2017	Acoustic Scene Classification	15	12.3 GB	Instructions
UrbanSound8K	Sound Event Classification	10	6.8 GB	Instructions
VocalSound	Vocal Sound Classification	6	8.2 GB	Instructions

All datasets should be placed in a directory named Audio-Datasets and the path of this directory should be specified in the variable DATASET_ROOT in the shell scripts. The directory structure should be as follows:

Audio-Datasets/
    ├── Beijing-Opera/
    ├── CREMA-D/
    ├── ESC50/ 
    ├── ESC50-Actions/
    ├── GT-Music-Genre/
    ├── NS-Instruments/
    ├── RAVDESS/
    ├── SESA/
    ├── TUT2017/
    ├── UrbanSound8K/
    ├── VocalSound/

Code Structure ❄️

There are three main folders in this repo: pengi, methods, utils. Code in pengi folder is taken from PENGI repo for model instantiation. Implementation of baselines (nba, nbad, flowmur) and our method trojanwave is in methods folder. Class definitions of audio and text encoder of PENGI model can be found in methods/encoders.py file. Training and dataset related code is in utils folder.

Run Attack Experiments ⚡

We have performed all experiments on NVIDIA A100-SXM4-40GB GPU. Shell scripts to run experiments can be found in scripts folder.

## General Command Structure
bash  <SHELL_SCRIPT>  <METHOD_NAME> <ATTACK_NAME>

Following methods (including trojanwave) are supported in this repository:

noattack nba nbad flowmur

Examples to run trojanwave method on different audio classifiction datasets have been provided below:

bash scripts/beijing_opera.sh palm trojanwave
bash scripts/crema_d.sh palm trojanwave
bash scripts/esc50_actions.sh palm trojanwave
bash scripts/esc50.sh palm trojanwave
bash scripts/gt_music_genre.sh palm trojanwave
bash scripts/ns_instruments.sh palm trojanwave
bash scripts/ravdess.sh palm trojanwave
bash scripts/sesa.sh palm trojanwave
bash scripts/tut.sh palm trojanwave
bash scripts/urban_sound.sh palm trojanwave
bash scripts/vocal_sound.sh palm trojanwave

Results are saved in json format in logs directory. To process results, run the following command (after running all experiments):

cd logs
bash results.sh

Sample Output

Note To simplify evaluation, we convert multi-fold datasets into a single train–test split rather than performing cross-validation.

Run Defense Experiments ⚡

To run TrojanWave-Defense experiments, please follow instructions in the README file of torjanwave-defense branch of this repo.

Results 🔬

Citation ⭐

If you find our work, this repository, or pretrained models useful, please consider giving a star ⭐ and citation.

@inproceedings{hanif2025trojanwave,
  title={TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models},
  author={Hanif, Asif and Agro, Maha Tufail and Shamshad, Fahad and Nandakumar, Karthik},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  pages={18628--18644},
  year={2025}
}

Contact 📫

Should you have any questions, please create an issue on this repository or contact us at asif.hanif@mbzuai.ac.ae

Acknowledgement 🙏

We used PENGI for model instantiation and borrowed a part of code from NBA, NBA-D and FlowMur to implement baselines. We thank the respective authors for releasing the code.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
logs		logs
media		media
methods		methods
pengi		pengi
scripts		scripts
utils		utils
.gitignore		.gitignore
DATASETS.md		DATASETS.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models (EMNLP'25)

Updates 🚀

Table of Contents

Installation ⚙️

Model 🔳

Datasets 📃

Code Structure ❄️

Run Attack Experiments ⚡

Run Defense Experiments ⚡

Results 🔬

Citation ⭐

Contact 📫

Acknowledgement 🙏

About

Uh oh!

Releases

Packages

Languages

License

asif-hanif/trojanwave

Folders and files

Latest commit

History

Repository files navigation

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models (EMNLP'25)

Updates 🚀

Table of Contents

Installation ⚙️

Model 🔳

Datasets 📃

Code Structure ❄️

Run Attack Experiments ⚡

Run Defense Experiments ⚡

Results 🔬

Citation ⭐

Contact 📫

Acknowledgement 🙏

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages