How to execute denoising? #35

a897456 · 2024-10-26T05:55:03Z

@bigpon Hi
I'm trying to reproduce the denoising code.
https://github.com/facebookresearch/AudioDec?tab=readme-ov-file#bonus-track-denoising
You mentioned following the requirements in submit_denoise.sh in this paragraph "Prepare the noisy-clean corpus and follow the usage instructions in submit_denoise.sh to run the training and testing", but the execution code below is submit_autoencoder.sh. May I ask what should be done?

The text was updated successfully, but these errors were encountered:

a897456 · 2024-10-26T07:09:51Z

AudioDec/trainer/denoise.py

Line 60 in 5ec3ab9

self.model["generator"].quantizer.codebook.eval()

Is the denoising process the same as that of the autoencoder. Does it require training the metric_loss first and then fixing the weights to continue training?

a897456 · 2024-10-27T07:45:17Z

Hi @bigpon
I completed 20,000 training sessions according to Stage 0 of submit_denoise.sh. However, when I started to execute Stage 1, it seemed that there was no response at all.

AudioDec/bin/train.py

Lines 106 to 118 in 5ec3ab9

    
           def run(self): 
        
               try: 
        
                   logging.info(f"The current training step: {self.trainer.steps}") 
        
                   self.trainer.train_max_steps = self.config["train_max_steps"] 
        
                   if not self.trainer._check_train_finish(): 
        
                       self.trainer.run() 
        
                   if self.config.get("adv_train_max_steps", False) and self.config.get("adv_batch_length", False): 
        
                       self.batch_length = self.config['adv_batch_length'] 
        
                       logging.info(f"Reload dataloader for adversarial training.")                 
        
                       self.initialize_data_loader() 
        
                       self.trainer.data_loader = self.data_loader 
        
                       self.trainer.train_max_steps = self.config["adv_train_max_steps"] 
        
                       self.trainer.run()

I suspect that perhaps the denoising process such as adv_train_max_steps or adv_batch_length doesn't require adversarial parameters, because I didn't find them in the configuration file like config/denoise/symAD_vctk_48000_hop300.yaml.

AudioDec/config/denoise/symAD_vctk_48000_hop300.yaml

Lines 174 to 180 in 5ec3ab9

    
           start_steps:                       # Number of steps to start training 
        
               generator: 0 
        
               discriminator: 200000  
        
           train_max_steps: 200000            # Number of training steps. 
        
           save_interval_steps: 100000        # Interval steps to save checkpoint. 
        
           eval_interval_steps: 1000          # Interval steps to evaluate the network. 
        
           log_interval_steps: 100            # Interval steps to record the training log.

bigpon · 2024-10-28T14:32:27Z

Hi,
there is a typo.
For running the denoising process, you have to first update the encoder while fixing the codebook and decoder.
I update the README.
Please follow the steps there.

a897456 · 2024-10-29T01:09:49Z

AudioDec/submit_denoise.sh

Lines 44 to 54 in 9cc4e58

    
           # stage 0 
        
           if echo ${stage} | grep -q 0; then 
        
               echo "Denoising Training" 
        
               config_name="config/${encoder}.yaml" 
        
               echo "Configuration file="$config_name 
        
               python codecTrain.py \ 
        
               -c ${config_name} \ 
        
               --tag ${encoder} \ 
        
               --exp_root ${exp} \ 
        
               --disable_cudnn ${disable_cudnn}  
        
           fi

AudioDec/config/denoise/symAD_vctk_48000_hop300.yaml

Lines 27 to 29 in 9cc4e58

    
           model_type: symAudioDec 
        
           train_mode: denoise 
        
           initial: exp/autoencoder/symAD_vctk_48000_hop300/checkpoint-200000steps.pkl # for model initialization

AudioDec/codecTrain.py

Lines 239 to 255 in 9cc4e58

    
           # MODEL INITIALIZATION 
        
           def initialize_model(self): 
        
               initial = self.config.get("initial", "")  
        
               if os.path.exists(self.resume): # resume from trained model 
        
                   self.trainer.load_checkpoint(self.resume) 
        
                   logging.info(f"Successfully resumed from {self.resume}.") 
        
               elif os.path.exists(initial): # initial new model with the pre-trained model 
        
                   self.trainer.load_checkpoint(initial, load_only_params=True) 
        
                   logging.info(f"Successfully initialize parameters from {initial}.") 
        
               else: 
        
                   logging.info("Train from scrach") 
        
               # load the pre-trained encoder for vocoder training 
        
               if self.train_mode in ['vocoder']: 
        
                   analyzer_checkpoint = self.config.get("analyzer", "") 
        
                   assert os.path.exists(analyzer_checkpoint), f"Analyzer {analyzer_checkpoint} does not exist!" 
        
                   analyzer_config = self._load_config(analyzer_checkpoint) 
        
                   self._initialize_analyzer(analyzer_config, analyzer_checkpoint)

I executed stage 0 according to submit_denoise.sh. However, I found that in the configuration file the file exp/autoencoder/symAD_vctk_48000_hop300/checkpoint-200000steps.pkl will be loaded as initial during stage 0. Do I need to train this file out in advance (for the new dataset)?

a897456 · 2024-10-29T07:49:56Z

Hi @bigpon
Could you help me analyze whether my understanding is correct or not? THS

First, according to the config/autoencoder/symAD_vctk_48000_hop300.yaml, perform autoencoder training on the clean speech for 200k steps to obtain a exp/autoencoder/symAD_vctk_48000_hop300/checkpoint-200000steps.pkl file.
Then, according to the config/denoise/symAD_vctk_48000_hop300.yaml, simultaneously use the exp/autoencoder/symAD_vctk_48000_hop300/checkpoint-200000steps.pkl file obtained in step 1 as the initial, and conduct 200k steps of denoise training on both the clean speech and the noisy speech to get another file exp/denoise/symAD_vctk_48000_hop300/checkpoint-200000steps.pkl.
Up to this point, the denoising training process is completed. The testing process is as follows:
1.In codetest.py, set process.encoder=decoder=exp/denoise/symAD_vctk_48000_hop300/checkpoint-200000steps.pkl to complete the testing

bigpon · 2024-10-29T13:16:49Z

Hi, in the first step, you have to train the decoder for another 500k iteration with GAN.

In the final step, you should take the decoder from the one trained with GAN.

a897456 · 2024-11-03T09:24:34Z

Hi @bigpon
I carried out the denoising process as you suggested. However, when I tested the PESQ score of the output audio, it was only 1.6. Meanwhile, I also listened to it and subjectively felt that it was just so-so. The following is the denoising process. Do you have any ways to improve the effect? Thank you.

a897456 · 2024-11-03T09:37:18Z

Hi bigpon,
My idea was to add the training of the discriminator in denoise.py, imitating the method in autoencoder.py. I actually did it this way, but the results still didn't improve.

bigpon · 2024-11-04T17:05:39Z

Because of the phase misaligned issue (you can check our paper ScoreDec), AudioDec usually achieves low PESQ even when the input is clean speech. Using multi-resolution mel-loss can improve the PESQ but it still cannot achieve a very high PESQ score.

For perceptual quality, although the PESQ score is low, the quality should be OK.

However, since it is just a simple approach to update only the encoder, it only achieves an OK performance, which still falls behind the SOTA speech enhancement methods.

a897456 · 2024-11-05T09:27:08Z

Because of the phase misaligned issue (you can check our paper ScoreDec), AudioDec usually achieves low PESQ even when the input is clean speech. Using multi-resolution mel-loss can improve the PESQ but it still cannot achieve a very high PESQ score.

Hi @bigpon
1.When is the ScoreDec expected to be open sourced?
2.Can the phase problem be compensated by setting use_shape_loss=true? I see that this value is always false in the configuration file.

bigpon · 2024-11-05T21:05:27Z

Hi,

We don't have any plan to release ScoreDec since people can easily train the post-filter model from this repo https://github.com/sp-uhh/sgmse . That is, once you prepare the AudioDec-coded- and natural- speech pair as the noisy and clean pairs, you can train a sgmse-based postfilter. Actually, I also used sgmse to do denoising, and it works well. Therefore, I recommend you use your current trained AudioDec (w/o the GAN training part, i.e. only the 1st stage) to prepare noisy-clean speech pairs, and then train a sgmse model with these pairs. After that, you can get a high-quality denoising codec (The phase is also aligned well). The only problem is the inference time is very slow because of the sgmse model.
No. The shape loss mostly improves the loudness modeling, and it cannot improve the phase modeling.

a897456 · 2024-11-07T07:34:53Z

Therefore, I recommend you use your current trained AudioDec (w/o the GAN training part, i.e. only the 1st stage) to prepare noisy-clean speech pairs, and then train a sgmse model with these pairs.

Hi @bigpon
When it comes to preparing noisy-clean speech pairs, does it mean that the new noisy speech obtained after the original noisy speech goes through AudioDec (w/o the GAN training part, i.e. only the 1st stage) should be grouped with the original clean speech? Or do both the original noisy speech and the original clean speech need to go through AudioDec?

bigpon · 2024-11-07T23:00:47Z

Hi, in this case, we want the postfilter to do two things.

remove the noise
compensate the codec distortion

Therefore the target speech is the clean speech without any process (i.e. the ground truth).
The noisy/input speech can be
Type I. noisy speech processed by 1st-stage AudioDec (suffering from both noise and codec distortions)
Type II. Clean speech processed by 1st-stage AudioDec (suffering from only codec distortions)

I have tried to use only I or I+II to train the postfilter.
For noisy speech, their performances are similar.
For clean speech, the model trained with I + II is better.

Therefore, I suggest you prepare both (Type-I, clean_speech) and (Type-II, clean_speech) pairs to train the postfilter.

a897456 · 2024-11-10T08:34:11Z

Hi @bigpon
I reorganized the dataset according to the suggestions you gave me. Then, under all the default settings, I carried out the training of SGMSE. The purple PESQ curve represents the unprocessed dataset, while the green PESQ curve represents the dataset that has been processed by Audiodec (including clean speech and noisy speech). However, I feel that the upward trend of PESQ has become sluggish.
I guess that perhaps the SGMSE might require some specific settings. But I have been using the default settings completely. I will update the results here again. Meanwhile, if you can identify where the problem lies, please remind me in a timely manner.

a897456 · 2024-11-11T14:03:17Z

Hi @bigpon
Is SGMSE already obsolete? I see that the PESQ scores of many speech enhancement models have already reached 3.6.

a897456 · 2024-11-12T03:15:13Z

Hi @bigpon

The curve of PESQ is still quite poor. I think there are some problems with the settings I've made, but I still haven't managed to find the correct ones. Could you please provide the setting parameters you had at that time? Including parameters such as backbone and SDE. I would be extremely grateful.

a897456 · 2024-11-12T07:41:26Z

Hi @bigpon

Are you using the settings of M6? Or something else? Could you disclose it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to execute denoising? #35

How to execute denoising? #35

a897456 commented Oct 26, 2024

a897456 commented Oct 26, 2024

a897456 commented Oct 27, 2024 •

edited

Loading

bigpon commented Oct 28, 2024

a897456 commented Oct 29, 2024 •

edited

Loading

a897456 commented Oct 29, 2024

bigpon commented Oct 29, 2024

a897456 commented Nov 3, 2024

a897456 commented Nov 3, 2024

bigpon commented Nov 4, 2024

a897456 commented Nov 5, 2024

bigpon commented Nov 5, 2024 •

edited

Loading

a897456 commented Nov 7, 2024 •

edited

Loading

bigpon commented Nov 7, 2024

a897456 commented Nov 10, 2024

a897456 commented Nov 11, 2024

a897456 commented Nov 12, 2024

a897456 commented Nov 12, 2024

How to execute denoising? #35

How to execute denoising? #35

Comments

a897456 commented Oct 26, 2024

a897456 commented Oct 26, 2024

a897456 commented Oct 27, 2024 • edited Loading

bigpon commented Oct 28, 2024

a897456 commented Oct 29, 2024 • edited Loading

a897456 commented Oct 29, 2024

bigpon commented Oct 29, 2024

a897456 commented Nov 3, 2024

a897456 commented Nov 3, 2024

bigpon commented Nov 4, 2024

a897456 commented Nov 5, 2024

bigpon commented Nov 5, 2024 • edited Loading

a897456 commented Nov 7, 2024 • edited Loading

bigpon commented Nov 7, 2024

a897456 commented Nov 10, 2024

a897456 commented Nov 11, 2024

a897456 commented Nov 12, 2024

a897456 commented Nov 12, 2024

a897456 commented Oct 27, 2024 •

edited

Loading

a897456 commented Oct 29, 2024 •

edited

Loading

bigpon commented Nov 5, 2024 •

edited

Loading

a897456 commented Nov 7, 2024 •

edited

Loading