I try to use the feature extractor on my audiofiles.
My audio files are all 16000Hz and 5 seconds long.
The waveform.shape[1]
is 80000
input_values = feature_extractor(waveform, sampling_rate=16000, return_tensors="pt").input_values
I get the error:
AssertionError: choose a window size 400 that is [2, 1]
and I don't really know what to do with it.
Here is the whole thing:
def preprocess_function(examples):
audio_files = examples['file_path']
inputs = {'input_values': []}
for audio_file in tqdm(audio_files, desc="Preprocessing dataset"):
waveform, sample_rate = torchaudio.load(audio_file)
# Ensure sample rate is 16000 Hz
assert sample_rate == 16000, f"Expected sample rate of 16000 Hz, but got {sample_rate} Hz"
# Assuming all audio files are 5 seconds long
max_len = 16000 * 5 # 5 seconds at 16000 Hz
# Pad or truncate to the maximum length
if waveform.shape[1] > max_len:
waveform = waveform[:, :max_len]
waveform = torch.nn.functional.pad(waveform, (0, max_len - waveform.shape[1]), "constant", 0)
input_values = feature_extractor(waveform, sampling_rate=16000, return_tensors="pt").input_values
return inputs
processed_dataset =, batched=True, remove_columns=['file_path'])```