You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks very much for sharing your code and pretrained models! I was able to successfully run prediction using the nisqa_tts.tar pretrained model already on several datasets, but one dataset gave this error (here is the full error text):
/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/librosa/core/spectrum.py:222: UserWarning: n_fft=4096 is too small for input signal of length=743
warnings.warn(
Traceback (most recent call last):
File "/share02/SLC/users/ecooper/workspace/NISQA/run_predict.py", line 43, in <module>
nisqa.predict()
File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_model.py", line 67, in predict
y_val_hat, y_val = NL.predict_mos(
File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 1434, in predict_mos
y_hat_list = [ [model(xb.to(dev), n_wins.to(dev)).cpu().numpy(), yb.cpu().numpy()] for xb, yb, (idx, n_wins) in dl]
File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 1434, in <listcomp>
y_hat_list = [ [model(xb.to(dev), n_wins.to(dev)).cpu().numpy(), yb.cpu().numpy()] for xb, yb, (idx, n_wins) in dl]
File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 2180, in __getitem__
x_spec_seg, n_wins = segment_specs(file_path,
File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 2262, in segment_specs
idx2 = torch.arange(n_wins)
RuntimeError: upper bound and larger bound inconsistent with step sign
This appears to be an issue of too-short audio samples, so I removed some of the shortest samples, but I still get this error. Checking my other datasets, they have samples which are equally short or shorter (I am guessing that maybe some batch-wise padding is happening which solves it?) In any case, I just wanted to ask whether there is some minimum audio sample length that is required for prediction. Thanks very much in advance for any advice!
The text was updated successfully, but these errors were encountered:
Thanks for bringing it up, this seems to happen if there are less segments / windows available than required. For the TTS model 15 segments are required and each segment is 0.02 seconds long with a 0.01 hop size, which should be 160 ms but in my tests it seems that 140 ms are sufficient. That might be due to the actual FFT window size being larger and some padding being applied.
Overall I don't know how reliable the model would be for such short samples. It was mostly trained on the Blizzard challenge datasets and they did not contain such short samples.
A workaround could be to add zero padding to the audio sample directly. This could also be done within the code after the sample is loaded here:
BTW, I have now added to raise an error in this case to make it clearer what the issue is. I am not sure why it wouldn't happen for some samples that are shorter. Do you have samples that are longer than 160 ms and they fail?
Hi, thanks so much for the helpful information, I actually did have one super short audio sample (50ms!) that I had somehow missed earlier that had been causing it to fail. I was able to run prediction successfully now after removing it.
Hello, thanks very much for sharing your code and pretrained models! I was able to successfully run prediction using the
nisqa_tts.tar
pretrained model already on several datasets, but one dataset gave this error (here is the full error text):This appears to be an issue of too-short audio samples, so I removed some of the shortest samples, but I still get this error. Checking my other datasets, they have samples which are equally short or shorter (I am guessing that maybe some batch-wise padding is happening which solves it?) In any case, I just wanted to ask whether there is some minimum audio sample length that is required for prediction. Thanks very much in advance for any advice!
The text was updated successfully, but these errors were encountered: