Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training problem #8

Open
leo67867 opened this issue Sep 24, 2024 · 3 comments
Open

training problem #8

leo67867 opened this issue Sep 24, 2024 · 3 comments

Comments

@leo67867
Copy link

Hello, I am processing and training the AISHELL-4 dataset using the command:

python diaper/train.py -c DiaPer/models/10attractors/SC_LibriSpeech_2spk_adapted1-10_finetuneAISHELL4mix/train.yaml,

where I modified init_epochs: 0 and init_model_path: ''. I split the AISHELL-4 dataset into a training and validation set with an 8:2 ratio. After training, I tested my model on the test set, but the results were different from yours. For example, for L_R003S01C02, I got a DER of 72.57, whereas yours was 47.37. For M_R003S01C01, my result was 65.53, but yours was 34.28. Also, I got my best results around the 10th epoch, whereas you got yours around the 190th-200th epoch.

Could you please share the specific details of your data splitting or processing methods? Do you have any suggestions on what might be going wrong with my approach and how I can improve it? Thank you.

@fnlandini
Copy link
Member

Hi @leo67867
I guess you are choosing the epoch based on the loss on development set, correct? I am attaching the
tensorboard.zip logs. You can see that in my case also the dev loss reached the lowest in just a few epochs. However, if you look at the dev_DER, you will see that it can still improve further. I observed this in a few of the fine-tuning steps with different sets. The attractor existence loss eventually grows but the BCE activation one improves on the dev set.
Have you tried evaluating your model for epoch 200? I expect you will get results more similar to mine. You can also try using the whole train set (instead of 80% of it) and run for 200 epochs as I did. In this case, since I used the test set as validation, I did not choose the number of epochs very carefully and just picked 200 as a reasonable guess.
My other question is if you are mixing the channels of AISHELL-4 to obtain the waveforms. That is what I did and I did not try using a single channel, for example. I am not sure if using a single channel might cause large degradation.

I hope this helps.
Federico

@leo67867
Copy link
Author

leo67867 commented Oct 9, 2024

Hello,thank you for your response. I followed your advice and used the entire training set, running it for 200 epochs and using the test set for validation. The results were slightly better than when I used 80% of the training set, but the performance is still significantly lower compared to your results. As you suggested, I converted the AISHELL-4 audio to mono-channel as input data because I encountered an error when trying to use the original 8-channel AISHELL-4 audio. This error prevented the code from running properly. The error details are as follows:
python DiaPer/diaper/train.py -c DiaPer/models/10attractors/SC_LibriSpeech_2spk_adapted1-10_finetuneAISHELL4mix/train.yaml
pre_crossattention 66048
latent_attractors 16384
encoder_attractors 1788672
latents2attractors 1280
counter 129
frame_encoder 2465920
Total trainable parameters: 4338433
miniconda3/envs/DiaPer/lib/python3.7/site-packages/librosa/util/decorators.py:88: UserWarning: n_fft=512 is too small for input signal of length=8
return f(*args, **kwargs)
Warning: ('20200616_M_R001S01C01', 0, 1500) is empty: (0, 257, 240000)
Traceback (most recent call last):
File "DiaPer/diaper/train.py", line 521, in
train_loader, dev_loader = get_training_dataloaders(args)
File "DiaPer/diaper/train.py", line 284, in get_training_dataloaders
Y_train, _, _, _, _, _ = train_set.getitem(0)
File "DiaPer/diaper/common_utils/diarization_dataset.py", line 131, in getitem
raise ValueError(f"Encountered an empty sequence at index {i}, and no saved sequence is available.")
ValueError: Encountered an empty sequence at index 0, and no saved sequence is available.

Are you able to handle multi-channel audio data? I’m not sure where the issue might be, or how I should proceed. I would greatly appreciate any guidance or suggestions you could provide.Thank you very much for your assistance!

@fnlandini
Copy link
Member

Hi @leo67867
Unfortunately, the code does not support multi-channel input. If you mixed the channels to obtain a mono file, then that is the same as I did.
My suggestion is that you compare the tensorboard I shared and the one of your training. Perhaps that will give a hint of what could be different. Otherwise, I am sorry but I am not sure what could be different. In case it is useful, I am attaching the data folders for train and test in case it is useful for you to spot any difference.
AISHELL4_data.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants