Skip to content

Any suggestion to fine-tune with a small dataset? #72

@hahunavth

Description

@hahunavth

Hi,

I tried fine-tuning with a small clean dataset of Vietnamese speech, self-collected from YouTube, about 100 hours of audio. Here are a few audio demos. However, the results did not meet my expectations.

Here’s how I prepare data:

  • Clean dataset: I used the Vietnamese data mentioned above. I filtered the collected audio segments that were shorter than 3 seconds to match sub_sample_length = 3.072.
  • Noise dataset: I downloaded the DNS Interspeech 2020 noise data from here: DNS-Challenge noise data.
  • RIR dataset: I downloaded the dataset from the release page here: RIR dataset.
  • Test dataset: I used the test set from DNS-Challenge: Test set.

I used a 3080 GPU with a batch size of 12 and gradient accumulation steps set to 3. Model starting from the checkpoint fullsubnet_best_model_58epochs.tar.

I trained for 15 epochs. However, the loss decreased only in the first few epochs and then started increasing. When I tested the inference on a few samples, I noticed that the model left more noise compared to the original performance.

Am I missing something in the fine-tuning process?
Do you have any advice for me?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions