Any suggestion to fine-tune with a small dataset?

Hi,

I tried fine-tuning with a small clean dataset of Vietnamese speech, self-collected from YouTube, about 100 hours of audio. Here are a few [audio demos](https://www.kaggle.com/code/hahunavth/se-pipeline-audio-demo). However, the results did not meet my expectations.

Here’s how I prepare data:
- Clean dataset: I used the Vietnamese data mentioned above. I filtered the collected audio segments that were shorter than 3 seconds to match `sub_sample_length = 3.072`.
- Noise dataset: I downloaded the DNS Interspeech 2020 noise data from here: [DNS-Challenge noise data](https://github.com/microsoft/DNS-Challenge/tree/interspeech2020/master/datasets/noise).
- RIR dataset: I downloaded the dataset from the release page here: [RIR dataset](https://github.com/Audio-WestlakeU/FullSubNet/releases/tag/v0.2).
- Test dataset: I used the test set from DNS-Challenge: [Test set](https://github.com/microsoft/DNS-Challenge/tree/interspeech2020/master/datasets/test_set).

I used a 3080 GPU with a batch size of 12 and gradient accumulation steps set to 3. Model starting from the checkpoint [fullsubnet_best_model_58epochs.tar](https://github.com/Audio-WestlakeU/FullSubNet/releases/tag/v0.2).

I trained for 15 epochs. However, the loss decreased only in the first few epochs and then started increasing. When I tested the inference on a few samples, I noticed that the model left more noise compared to the original performance.

Am I missing something in the fine-tuning process? 
Do you have any advice for me?

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any suggestion to fine-tune with a small dataset? #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Any suggestion to fine-tune with a small dataset? #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions