Skip to content

Persephone entry #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,20 @@ The full API is described in the documentation page [https://hyperion-ml.readthe
### Prerequisites

We use anaconda or miniconda, though you should be able to make it work in other python distributions
To start, you should create a new enviroment and install PyTorch>=1.9, (older versions are not supported any longer) e.g.:
To start, you should create a new enviroment and install PyTorch:
```
conda create --name ${your_env} python=3.8
conda create --name ${your_env} python=3.11
conda activate ${your_env}
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=10.2 -c pytorch
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
```
In next Hyperion versions, we will upgrade to Pytorch>=1.9 and drop compatibility with older PyTorch versions.

For systems with cuda 10.2 driver:
```
conda create --name ${your_env} python=3.10
conda activate ${your_env}
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch
```


### Installing Hyperion

Expand Down
34 changes: 34 additions & 0 deletions egs/voxceleb/v1.2/conf/reverb_noise_aug.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
reverb_aug:
reverb_prob: 0.45
max_reverb_context: 0.5
rir_types:
smallroom:
weight: 1
rir_path: csv:data/rirs_smallroom/rirs.csv
rir_norm: max
mediumroom:
weight: 1
rir_path: csv:data/rirs_mediumroom/rirs.csv
rir_norm: max
realroom:
weight: 1
rir_path: csv:data/rirs_real/rirs.csv
rir_norm: max
noise_aug:
noise_prob: 0.7
noise_types:
noise:
weight: 1
noise_path: data/musan_noise_proc_audio/recordings.csv
min_snr: 0
max_snr: 18
music:
weight: 1
noise_path: data/musan_music_proc_audio/recordings.csv
min_snr: 3
max_snr: 18
babble:
weight: 1
noise_path: data/musan_speech_babble/recordings.csv
min_snr: 3
max_snr: 18
26 changes: 13 additions & 13 deletions egs/voxceleb/v1.2/run_001_prepare_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,31 +16,31 @@ config_file=default_config.sh

if [ $stage -le 1 ];then
# Prepare the VoxCeleb2 dataset for training.
prepare_data.py voxceleb2 --subset dev --corpus-dir $voxceleb2_root \
--cat-videos --use-kaldi-ids \
--output-dir data/voxceleb2cat_train
hyperion-prepare-data voxceleb2 --subset dev --corpus-dir $voxceleb2_root \
--cat-videos --use-kaldi-ids \
--output-dir data/voxceleb2cat_train
fi

if [ $stage -le 2 ];then
# prepare voxceleb1 for test
prepare_data.py voxceleb1 --task test --corpus-dir $voxceleb1_root \
--use-kaldi-ids \
--output-dir data/voxceleb1_test
hyperion-prepare-data voxceleb1 --task test --corpus-dir $voxceleb1_root \
--use-kaldi-ids \
--output-dir data/voxceleb1_test
fi

if [ $stage -le 3 ] && [ "$do_voxsrc22" == "true" ];then
prepare_data.py voxsrc22 --subset dev --corpus-dir $voxsrc22_root \
--vox1-corpus-dir $voxceleb1_root \
--output-dir data/voxsrc22_dev
hyperion-prepare-data voxsrc22 --subset dev --corpus-dir $voxsrc22_root \
--vox1-corpus-dir $voxceleb1_root \
--output-dir data/voxsrc22_dev
fi

# if [ $stage -le 4 ] && [ "$do_voxsrc22" == "true" ];then
# prepare_data.py voxsrc22 --subset test --corpus-dir $voxsrc22_root \
# --vox1-corpus-dir $voxceleb1_root \
# --output-dir data/voxsrc22_test
# hyperion-prepare-data voxsrc22 --subset test --corpus-dir $voxsrc22_root \
# --vox1-corpus-dir $voxceleb1_root \
# --output-dir data/voxsrc22_test
# fi

if [ $stage -le 5 ] && [ "$do_qmf" == "true" ];then
# split vox2 into 2 parts, for cohort and qmf training
split_dataset_into_trials_and_cohort.py --data-dir data/voxceleb2cat_train
hyperion-split-dataset-into-trials-and-cohort --data-dir data/voxceleb2cat_train
fi
16 changes: 8 additions & 8 deletions egs/voxceleb/v1.2/run_002_compute_evad.sh
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,18 @@ if [ $stage -le 2 ];then
echo "compute vad for $name"
$train_cmd JOB=1:$nj $vad_dir/$name/log/vad.JOB.log \
hyp_utils/conda_env.sh \
compute_energy_vad.py --cfg $vad_config \
hyperion-compute-energy-vad --cfg $vad_config \
--recordings-file data/$name/recordings.csv \
--output-spec ark,csv:$vad_dir/$name/vad.JOB.ark,$vad_dir/$name/vad.JOB.csv \
--part-idx JOB --num-parts $nj || exit 1

hyperion_tables.py cat \
--table-type features \
--output-file $vad_dir/$name/vad.csv --num-tables $nj
hyperion_dataset.py add_features \
--dataset data/$name \
--features-name vad \
--features-file $vad_dir/$name/vad.csv
hyperion-tables cat \
--table-type features \
--output-file $vad_dir/$name/vad.csv --num-tables $nj
hyperion-dataset add_features \
--dataset data/$name \
--features-name vad \
--features-file $vad_dir/$name/vad.csv
done
fi

Expand Down
102 changes: 51 additions & 51 deletions egs/voxceleb/v1.2/run_003_prepare_noises_rirs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ config_file=default_config.sh
if [ $stage -le 1 ]; then
for name in noise music speech
do
prepare_data.py musan \
--corpus-dir $musan_root \
--subset $name \
--output-dir data/musan_$name
hyperion-prepare-data musan \
--corpus-dir $musan_root \
--subset $name \
--output-dir data/musan_$name
done
fi

Expand All @@ -37,66 +37,66 @@ if [ $stage -le 2 ]; then
output_dir=exp/proc_audio/$name
$train_cmd JOB=1:$nj $output_dir/log/preproc_audios_${name}.JOB.log \
hyp_utils/conda_env.sh \
preprocess_audio_files.py \
hyperion-preprocess-audio-files \
--audio-format flac \
--part-idx JOB --num-parts $nj \
--recordings-file $input_data_dir/recordings.csv \
--output-path $output_dir \
--output-recordings-file $output_dir/recordings.JOB.csv

hyperion_tables.py cat \
--table-type recordings \
--output-file $output_dir/recordings.csv --num-tables $nj
hyperion_dataset.py set_recordings \
--dataset $input_data_dir \
--recordings-file $output_dir/recordings.csv \
--output-dataset $output_data_dir
hyperion-tables cat \
--table-type recordings \
--output-file $output_dir/recordings.csv --num-tables $nj
hyperion-dataset set_recordings \
--dataset $input_data_dir \
--recordings-file $output_dir/recordings.csv \
--output-dataset $output_data_dir


done
fi

if [ $stage -le 3 ]; then
# Create Babble noise from MUSAN speech files
for name in musan_speech
do
input_data_dir=data/$name
output_data_dir=data/${name}_babble
output_dir=exp/proc_audio/${name}_babble
$train_cmd $output_dir/log/make_babble_noise_${name}.log \
hyp_utils/conda_env.sh \
make_babble_noise_audio_files.py \
--audio-format flac \
--min-spks 3 --max-spks 10 --num-reuses 5 \
--recordings-file $input_data_dir/recordings.csv \
--output-path $output_dir \
--output-recordings-file $output_data_dir/recordings.csv
hyperion_dataset.py make_from_recordings \
--dataset $output_data_dir \
--recordings-file $output_data_dir/recordings.csv
done
# Create Babble noise from MUSAN speech files
for name in musan_speech
do
input_data_dir=data/$name
output_data_dir=data/${name}_babble
output_dir=exp/proc_audio/${name}_babble
$train_cmd $output_dir/log/make_babble_noise_${name}.log \
hyp_utils/conda_env.sh \
hyperion-make-babble-noise-audio-files \
--audio-format flac \
--min-spks 3 --max-spks 10 --num-reuses 5 \
--recordings-file $input_data_dir/recordings.csv \
--output-path $output_dir \
--output-recordings-file $output_data_dir/recordings.csv
hyperion-dataset make_from_recordings \
--dataset $output_data_dir \
--recordings-file $output_data_dir/recordings.csv
done
fi

if [ $stage -le 4 ]; then
if [ ! -d "RIRS_NOISES" ]; then
# Download the package that includes the real RIRs, simulated RIRs, isotropic noises and point-source noises
wget --no-check-certificate http://www.openslr.org/resources/28/rirs_noises.zip
unzip rirs_noises.zip
fi
prepare_data.py rirs --corpus-dir RIRS_NOISES/simulated_rirs/smallroom --output-dir data/rirs_smallroom
prepare_data.py rirs --corpus-dir RIRS_NOISES/simulated_rirs/mediumroom --output-dir data/rirs_mediumroom
prepare_data.py rirs --corpus-dir RIRS_NOISES/real_rirs_isotropic_noises --output-dir data/rirs_real
for rirs in rirs_smallroom rirs_mediumroom rirs_real
do
output_dir=exp/rirs/$rirs
data_dir=data/$rirs
$train_cmd $output_dir/log/pack_rirs_${name}.log \
hyp_utils/conda_env.sh \
pack_wav_rirs.py ${args} --input $data_dir/recordings.csv \
--output h5,csv:$output_dir/rirs.h5,$output_dir/rirs.csv || exit 1;
hyperion_dataset.py add_features --dataset $data_dir \
--features-name rirs --features-file $output_dir/rirs.csv
if [ ! -d "RIRS_NOISES" ]; then
# Download the package that includes the real RIRs, simulated RIRs, isotropic noises and point-source noises
wget --no-check-certificate http://www.openslr.org/resources/28/rirs_noises.zip
unzip rirs_noises.zip
fi
hyperion-prepare-data rirs --corpus-dir RIRS_NOISES/simulated_rirs/smallroom --output-dir data/rirs_smallroom
hyperion-prepare-data rirs --corpus-dir RIRS_NOISES/simulated_rirs/mediumroom --output-dir data/rirs_mediumroom
hyperion-prepare-data rirs --corpus-dir RIRS_NOISES/real_rirs_isotropic_noises --output-dir data/rirs_real
for rirs in rirs_smallroom rirs_mediumroom rirs_real
do
output_dir=exp/rirs/$rirs
data_dir=data/$rirs
$train_cmd $output_dir/log/pack_rirs_${name}.log \
hyp_utils/conda_env.sh \
hyperion-pack-wav-rirs ${args} --input $data_dir/recordings.csv \
--output h5,csv:$output_dir/rirs.h5,$output_dir/rirs.csv || exit 1;
hyperion-dataset add_features --dataset $data_dir \
--features-name rirs --features-file $output_dir/rirs.csv

done
done
fi

46 changes: 23 additions & 23 deletions egs/voxceleb/v1.2/run_004_prepare_xvec_train_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,42 +35,42 @@ if [ $stage -le 2 ];then

$train_cmd JOB=1:$nj $output_dir/log/preproc_audios_${nnet_data}.JOB.log \
hyp_utils/conda_env.sh \
preprocess_audio_files.py \
hyperion-preprocess-audio-files \
--audio-format flac --remove-dc-offset $vad_args \
--part-idx JOB --num-parts $nj \
--recordings-file data/$nnet_data/recordings.csv \
--output-path $output_dir \
--output-recordings-file $output_dir/recordings.JOB.csv

hyperion_tables.py cat \
--table-type recordings \
--output-file $output_dir/recordings.csv --num-tables $nj
hyperion-tables cat \
--table-type recordings \
--output-file $output_dir/recordings.csv --num-tables $nj

hyperion_dataset.py set_recordings $update_durs \
--dataset data/$nnet_data \
--recordings-file $output_dir/recordings.csv \
--output-dataset data/${nnet_data}_proc_audio \
--remove-features vad
hyperion-dataset set_recordings $update_durs \
--dataset data/$nnet_data \
--recordings-file $output_dir/recordings.csv \
--output-dataset data/${nnet_data}_proc_audio \
--remove-features vad
fi

if [ $stage -le 3 ];then
hyperion_dataset.py remove_short_segments \
--dataset data/${nnet_data}_proc_audio \
--output-dataset data/${nnet_data}_filtered \
--length-name duration --min-length 2.0
hyperion-dataset remove_short_segments \
--dataset data/${nnet_data}_proc_audio \
--output-dataset data/${nnet_data}_filtered \
--length-name duration --min-length 2.0

hyperion_dataset.py remove_classes_few_segments \
--dataset data/${nnet_data}_filtered \
--class-name speaker --min-segs 4
hyperion-dataset remove_classes_few_segments \
--dataset data/${nnet_data}_filtered \
--class-name speaker --min-segs 4
fi

if [ $stage -le 4 ];then
hyperion_dataset.py split_train_val \
--dataset data/${nnet_data}_filtered \
--val-prob 0.03 \
--joint-classes speaker --min-train-samples 1 \
--seed 1123581321 \
--train-dataset data/${nnet_data}_xvector_train \
--val-dataset data/${nnet_data}_xvector_val
hyperion-dataset split_train_val \
--dataset data/${nnet_data}_filtered \
--val-prob 0.03 \
--joint-classes speaker --min-train-samples 1 \
--seed 1123581321 \
--train-dataset data/${nnet_data}_xvector_train \
--val-dataset data/${nnet_data}_xvector_val
fi

4 changes: 2 additions & 2 deletions egs/voxceleb/v1.2/run_005_train_xvector.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ if [ $stage -le 1 ]; then
$cuda_cmd \
--gpu $ngpu $nnet_s1_dir/log/train.log \
hyp_utils/conda_env.sh --conda-env $HYP_ENV --num-gpus $ngpu \
train_wav2xvector.py $nnet_type --cfg $nnet_s1_base_cfg $nnet_s1_args $extra_args \
hyperion-train-wav2xvector $nnet_type --cfg $nnet_s1_base_cfg $nnet_s1_args $extra_args \
--data.train.dataset.recordings-file $train_data_dir/recordings.csv \
--data.train.dataset.segments-file $train_data_dir/segments.csv \
--data.train.dataset.class-files $train_data_dir/speaker.csv \
Expand All @@ -65,7 +65,7 @@ if [ $stage -le 2 ]; then
$cuda_cmd \
--gpu $ngpu $nnet_s2_dir/log/train.log \
hyp_utils/conda_env.sh --conda-env $HYP_ENV --num-gpus $ngpu \
finetune_wav2xvector.py $nnet_type --cfg $nnet_s2_base_cfg $nnet_s2_args $extra_args \
hyperion-finetune-wav2xvector $nnet_type --cfg $nnet_s2_base_cfg $nnet_s2_args $extra_args \
--data.train.dataset.recordings-file $train_data_dir/recordings.csv \
--data.train.dataset.segments-file $train_data_dir/segments.csv \
--data.train.dataset.class-files $train_data_dir/speaker.csv \
Expand Down
16 changes: 8 additions & 8 deletions egs/voxceleb/v1.2/run_006_extract_xvectors.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,15 +58,15 @@ if [[ $stage -le 1 && ( "$do_plda" == "true" || "$do_snorm" == "true" || "$do_qm
echo "Extracting x-vectors for $name"
$xvec_cmd JOB=1:$nj $output_dir/log/extract_xvectors.JOB.log \
hyp_utils/conda_env.sh --num-gpus $num_gpus \
extract_wav2xvectors.py ${xvec_args} ${vad_args} \
hyperion-extract-wav2xvectors ${xvec_args} ${vad_args} \
--part-idx JOB --num-parts $nj \
--recordings-file data/$name/recordings.csv \
--random-utt-length --min-utt-length 2 --max-utt-length 30 \
--model-path $nnet \
--output-spec ark,csv:$output_dir/xvector.JOB.ark,$output_dir/xvector.JOB.csv
hyperion_tables.py cat \
--table-type features \
--output-file $output_dir/xvector.csv --num-tables $nj
hyperion-tables cat \
--table-type features \
--output-file $output_dir/xvector.csv --num-tables $nj

done
fi
Expand All @@ -88,14 +88,14 @@ if [ $stage -le 2 ]; then
echo "Extracting x-vectors for $name"
$xvec_cmd JOB=1:$nj $output_dir/log/extract_xvectors.JOB.log \
hyp_utils/conda_env.sh --num-gpus $num_gpus \
extract_wav2xvectors.py ${xvec_args} ${vad_args} \
hyperion-extract-wav2xvectors ${xvec_args} ${vad_args} \
--part-idx JOB --num-parts $nj \
--recordings-file data/$name/recordings.csv \
--model-path $nnet \
--output-spec ark,csv:$output_dir/xvector.JOB.ark,$output_dir/xvector.JOB.csv
hyperion_tables.py cat \
--table-type features \
--output-file $output_dir/xvector.csv --num-tables $nj
hyperion-tables cat \
--table-type features \
--output-file $output_dir/xvector.csv --num-tables $nj

done
fi
Expand Down
Loading