DeepFake Audio Detection

Overview

This project aims to detect audio DeepFakes by leveraging the ASVspoof 2019 dataset. The focus is on distinguishing bona fide (real) speech from spoofed audio using various preprocessing techniques and machine learning models. By exploring different architectures and feature extraction methods, we aim to address the growing challenges posed by synthetic and manipulated audio in the field of digital forensics.

Authors: Tomas Lovato, Alisea Bovo
Academic Year: Digital Forensics 2024/2025
GitHub Repository: DF_AudioDeepfakeDetection

Dataset Details

The project is based on the ASVspoof 2019 Logical Access (LA) dataset. This dataset includes three partitions:

Training Set: 25,380 samples (20 speakers: 8 male, 12 female).
Development Set: 24,844 samples (20 speakers: 8 male, 12 female).
Evaluation Set: Approximately 72,000 samples (48 speakers: 21 male, 27 female).

Spoofing Techniques

Known Attacks: 6 types (4 TTS, 2 VC).
Unknown Attacks: 11 types (6 TTS, 2 VC, 3 hybrids).

Key Challenges

The evaluation set contains attacks generated by unseen algorithms, testing the generalization of models.

Audio Preprocessing

The project's success hinges on effective preprocessing of audio data. Three main preprocessing strategies were used:

1. Mel Spectrogram

Purpose: Feature extraction for CNN-based models.
Steps:
- Audio loaded using Librosa with a fixed sample rate of 16 kHz.
- Temporal normalization and padding/truncating to a standard duration.
- Mel Spectrogram extraction.

2. MFCC Features

Purpose: Used with SVM and One-Class SVM (OCSVM).
Steps:
- Temporal normalization and scaling.
- MFCC feature extraction.

3. STFT Features

Purpose: Input for SVM models after dimensionality reduction.
Steps:
- Extraction of STFT features.
- Dimensionality reduction using Autoencoders or PCA.

Machine Learning Models

CNN with Mel Spectrogram

Unbalanced Training:
- Trained on the unbalanced dataset.
- Achieved high accuracy but biased predictions (tended to classify most samples as spoofed).
Balanced Training:
- Classes balanced by oversampling or undersampling.
- Significant improvement in confusion matrix and ROC-AUC scores.
Advanced CNN:
- Larger convolution kernels for capturing temporal dependencies.
- Achieved near-perfect results on both seen and unseen datasets.

SVM and OCSVM

MFCC Features:
- Moderate ROC (~0.7) with challenges in generalization.
- Future work: Hyperparameter tuning with grid search.
STFT Features:
- Dimensionality reduced via Autoencoders or PCA.
- Performance improved with PCA, but computational cost remains high.

Fairness Analysis

Despite the gender imbalance in the dataset:

The model did not exhibit significant bias.
Slightly better performance was observed for female voices.

Achievements

Successfully implemented CNNs with Mel Spectrograms, leading to robust detection performance.
Developed insights into feature extraction and dimensionality reduction techniques.

Challenges

Difficulty in generalizing to unseen spoofing techniques.
High computational requirements for feature extraction and dimensionality reduction.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
deprecated		deprecated
.gitattributes		.gitattributes
.gitignore		.gitignore
MEL_CNN_Balanced.ipynb		MEL_CNN_Balanced.ipynb
MEL_CNN_Balanced_BestModel.h5		MEL_CNN_Balanced_BestModel.h5
MEL_CNN_Balanced_BestModel.png		MEL_CNN_Balanced_BestModel.png
MEL_CNN_Silence.ipynb		MEL_CNN_Silence.ipynb
MEL_CNN_Silence_BestModel.pth		MEL_CNN_Silence_BestModel.pth
MEL_CNN_Unbalanced.ipynb		MEL_CNN_Unbalanced.ipynb
MEL_CNN_Unbalanced_Architecture.png		MEL_CNN_Unbalanced_Architecture.png
MEL_CNN_Unbalanced_BestModel.h5		MEL_CNN_Unbalanced_BestModel.h5
MEL_RNN_Analisys.ipynb		MEL_RNN_Analisys.ipynb
MFCC_OC_SVM_Analysis.ipynb		MFCC_OC_SVM_Analysis.ipynb
MFCC_OC_SVM_Analysis_BestModel.pkl		MFCC_OC_SVM_Analysis_BestModel.pkl
MFCC_OC_SVM_BestModel.pkl		MFCC_OC_SVM_BestModel.pkl
MFCC_SVM_Analysis.ipynb		MFCC_SVM_Analysis.ipynb
NN_Silence_Ali.ipynb		NN_Silence_Ali.ipynb
Project_Assignment.pdf		Project_Assignment.pdf
Project_DeepFakeAudioDetection_TomasLovato_AliseaBovo.pdf		Project_DeepFakeAudioDetection_TomasLovato_AliseaBovo.pdf
Project_DeepFakeAudioDetection_TomasLovato_AliseaBovo.pptx		Project_DeepFakeAudioDetection_TomasLovato_AliseaBovo.pptx
README.md		README.md
STFT_Autoencoder.ipynb		STFT_Autoencoder.ipynb
STFT_Autoencoder_OCSVM.ipynb		STFT_Autoencoder_OCSVM.ipynb
STFT_Autoencoder_OCSVM_BestModel.pkl		STFT_Autoencoder_OCSVM_BestModel.pkl
STFT_PCA_OCSVM.ipynb		STFT_PCA_OCSVM.ipynb
logo_unipd.jpeg		logo_unipd.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepFake Audio Detection

Overview

Dataset Details

Spoofing Techniques

Key Challenges

Audio Preprocessing

1. Mel Spectrogram

2. MFCC Features

3. STFT Features

Machine Learning Models

CNN with Mel Spectrogram

SVM and OCSVM

Fairness Analysis

Achievements

Challenges

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

LovatoTomas/DF_AudioDeepfakeDetection

Folders and files

Latest commit

History

Repository files navigation

DeepFake Audio Detection

Overview

Dataset Details

Spoofing Techniques

Key Challenges

Audio Preprocessing

1. Mel Spectrogram

2. MFCC Features

3. STFT Features

Machine Learning Models

CNN with Mel Spectrogram

SVM and OCSVM

Fairness Analysis

Achievements

Challenges

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages