Skip to content

Empty Recordings File in utils/fix_data_dir.sh Script #4920

Open
@pri1712

Description

I am using the WeSpeaker pipeline and Kaldi toolkit for a speaker diarization task, employing ResNet as the feature extractor. During the filtering of my segments file using the script utils/fix_data_dir.sh, I ran into an issue where the script filters my segments file to zero lines due to the temporary file /tmp/kaldi.XXXX/recordings having no entries

The following is the link to the script : https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/fix_data_dir.sh .

This is a part of the error output:
utils/fix_data_dir.sh: filtered /data1/XYZ/ABC/speaker_diarization/SHARC_check/tools_wespk/data/ABC_dev_fbank_seg/old_dir/segments from 8310 to 0 lines based on filter /tmp/kaldi.3oKA/recordings.
I found that the file /tmp/kaldi.XXXX/recordings generated by the script is empty which causes the script to filter out all lines from the segments file.

  1. What might be causing the /tmp/kaldi.XXXX/recordings file to be empty?

  2. Are there any known issues or additional steps required to ensure the recordings file is correctly populated?

If required I can provide the formats of the necessary files to check for any formatting errors between segments and wav.scp which is being used to generate the /tmp/kaldi.XXXX/recordings file

Thanks

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions