This tool allows you to create small dataset using Voxceleb data. You can define how many speakers or utterances you want in the train and the test. You can choose the number of test.
The main goal of the tool is to visualize vectors which are extracted from STKLIA for example. It also find the prototypes (from Interpretable machine learning) (this is the utterance that has the best representation of the data) and the criticisms (the utterance which are whether underrepresented or overrepresented). It reduces the vectors in the number of dimensions you want using UMAP (or UMAP and LDA or LDA). The plot that you obtains at the end show one color for each speaker and if you click on an utterance whether it plays the given utterance or it opens a new plot with the utterances of the speaker. Many things can be configured.
go to src/smallDatasetCreator and take a look at the README.md.
go to src/ and take a look at README.md
Just follow these steps:
pip install matplotlib
pip install PyAudio
pip install umap-learn
pip install tqdm
pip install pyaml
or just run:
pip install -r requirements.txt
you may also need : STKLIA and so Kaldi, PyTorch and Voxceleb.
Launch one of the two shell script in src/smallDatasetCreator and it will create a small dataset. For example:
bash speakersSelector.sh
You can check some infos about the new train or a test (change /Train by /TestX where X is the number of the test) by running:
python3 datasetInfo ../../toy_dataset/NewSet/Train/feats.scp
We train the STKLIA with the train of this small dataset (an example of config file is given in /toy_dataset/Exemple_speaker.cfg. We extract the vectors of the train. Then we extract vectors of test. The result will be like in "pretrained_models".
Now, you can configure /configs/config.yaml to use the vectors that you have extracted. Finally launch in src/:
python3 run.py --conf ../configs/config.yaml --mode reduction
And if you want to compare two tests or one test and the train you can by configuring /configs/compare.yaml and then running:
python3 run.py --conf ../configs/compare.yaml
You need to have save prototypes, criticisms and utterances with run.py.