Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: upper bound and larger bound inconsistent with step sign #54

Closed
ecooper7 opened this issue Nov 28, 2024 · 2 comments
Closed

Comments

@ecooper7
Copy link

Hello, thanks very much for sharing your code and pretrained models! I was able to successfully run prediction using the nisqa_tts.tar pretrained model already on several datasets, but one dataset gave this error (here is the full error text):

/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/librosa/core/spectrum.py:222: UserWarning: n_fft=4096 is too small for input signal of length=743
  warnings.warn(
Traceback (most recent call last):
  File "/share02/SLC/users/ecooper/workspace/NISQA/run_predict.py", line 43, in <module>
    nisqa.predict()
  File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_model.py", line 67, in predict
    y_val_hat, y_val = NL.predict_mos(
  File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 1434, in predict_mos
    y_hat_list = [ [model(xb.to(dev), n_wins.to(dev)).cpu().numpy(), yb.cpu().numpy()] for xb, yb, (idx, n_wins) in dl]
  File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 1434, in <listcomp>
    y_hat_list = [ [model(xb.to(dev), n_wins.to(dev)).cpu().numpy(), yb.cpu().numpy()] for xb, yb, (idx, n_wins) in dl]
  File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/SLC/users/ecooper/miniconda3/envs/nisqa/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 2180, in __getitem__
    x_spec_seg, n_wins = segment_specs(file_path,
  File "/share02/SLC/users/ecooper/workspace/NISQA/nisqa/NISQA_lib.py", line 2262, in segment_specs
    idx2 = torch.arange(n_wins)
RuntimeError: upper bound and larger bound inconsistent with step sign

This appears to be an issue of too-short audio samples, so I removed some of the shortest samples, but I still get this error. Checking my other datasets, they have samples which are equally short or shorter (I am guessing that maybe some batch-wise padding is happening which solves it?) In any case, I just wanted to ask whether there is some minimum audio sample length that is required for prediction. Thanks very much in advance for any advice!

@gabrielmittag
Copy link
Owner

Hi,

Thanks for bringing it up, this seems to happen if there are less segments / windows available than required. For the TTS model 15 segments are required and each segment is 0.02 seconds long with a 0.01 hop size, which should be 160 ms but in my tests it seems that 140 ms are sufficient. That might be due to the actual FFT window size being larger and some padding being applied.

Overall I don't know how reliable the model would be for such short samples. It was mostly trained on the Blizzard challenge datasets and they did not contain such short samples.

A workaround could be to add zero padding to the audio sample directly. This could also be done within the code after the sample is loaded here:

y, sr = lb.load(file_path, sr=sr, mono=False)

BTW, I have now added to raise an error in this case to make it clearer what the issue is. I am not sure why it wouldn't happen for some samples that are shorter. Do you have samples that are longer than 160 ms and they fail?

@ecooper7
Copy link
Author

ecooper7 commented Dec 3, 2024

Hi, thanks so much for the helpful information, I actually did have one super short audio sample (50ms!) that I had somehow missed earlier that had been causing it to fail. I was able to run prediction successfully now after removing it.

@ecooper7 ecooper7 closed this as completed Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants