Skip to content

LovatoTomas/DF_AudioDeepfakeDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepFake Audio Detection

Project Logo

Overview

This project aims to detect audio DeepFakes by leveraging the ASVspoof 2019 dataset. The focus is on distinguishing bona fide (real) speech from spoofed audio using various preprocessing techniques and machine learning models. By exploring different architectures and feature extraction methods, we aim to address the growing challenges posed by synthetic and manipulated audio in the field of digital forensics.

Authors: Tomas Lovato, Alisea Bovo
Academic Year: Digital Forensics 2024/2025
GitHub Repository: DF_AudioDeepfakeDetection


Dataset Details

The project is based on the ASVspoof 2019 Logical Access (LA) dataset. This dataset includes three partitions:

  • Training Set: 25,380 samples (20 speakers: 8 male, 12 female).
  • Development Set: 24,844 samples (20 speakers: 8 male, 12 female).
  • Evaluation Set: Approximately 72,000 samples (48 speakers: 21 male, 27 female).

Spoofing Techniques

  • Known Attacks: 6 types (4 TTS, 2 VC).
  • Unknown Attacks: 11 types (6 TTS, 2 VC, 3 hybrids).

Key Challenges

  • The evaluation set contains attacks generated by unseen algorithms, testing the generalization of models.

Audio Preprocessing

The project's success hinges on effective preprocessing of audio data. Three main preprocessing strategies were used:

1. Mel Spectrogram

  • Purpose: Feature extraction for CNN-based models.
  • Steps:
    • Audio loaded using Librosa with a fixed sample rate of 16 kHz.
    • Temporal normalization and padding/truncating to a standard duration.
    • Mel Spectrogram extraction.

2. MFCC Features

  • Purpose: Used with SVM and One-Class SVM (OCSVM).
  • Steps:
    • Temporal normalization and scaling.
    • MFCC feature extraction.

3. STFT Features

  • Purpose: Input for SVM models after dimensionality reduction.
  • Steps:
    • Extraction of STFT features.
    • Dimensionality reduction using Autoencoders or PCA.

Machine Learning Models

CNN with Mel Spectrogram

  1. Unbalanced Training:
    • Trained on the unbalanced dataset.
    • Achieved high accuracy but biased predictions (tended to classify most samples as spoofed).
  2. Balanced Training:
    • Classes balanced by oversampling or undersampling.
    • Significant improvement in confusion matrix and ROC-AUC scores.
  3. Advanced CNN:
    • Larger convolution kernels for capturing temporal dependencies.
    • Achieved near-perfect results on both seen and unseen datasets.

SVM and OCSVM

  1. MFCC Features:
    • Moderate ROC (~0.7) with challenges in generalization.
    • Future work: Hyperparameter tuning with grid search.
  2. STFT Features:
    • Dimensionality reduced via Autoencoders or PCA.
    • Performance improved with PCA, but computational cost remains high.

Fairness Analysis

Despite the gender imbalance in the dataset:

  • The model did not exhibit significant bias.
  • Slightly better performance was observed for female voices.

Achievements

  • Successfully implemented CNNs with Mel Spectrograms, leading to robust detection performance.
  • Developed insights into feature extraction and dimensionality reduction techniques.

Challenges

  • Difficulty in generalizing to unseen spoofing techniques.
  • High computational requirements for feature extraction and dimensionality reduction.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •