-
Notifications
You must be signed in to change notification settings - Fork 696
TCPGen in Conformer RNN-T #2890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
120 commits
Select commit
Hold shift + click to select a range
c686251
first commit BrianSun
8a6ae5c
second commit
b52cda0
add specific paths to cudnn
657d8c6
constructing tree done
4084d5f
Add biasing LIBRISPEECH data processor
96eeb6f
implemented tree search and DBRNNT training procedure
1a00344
add fused log smax option
b8621e8
fixes
68664dc
move out log softmax
e063b33
code for inference with TCPGen
df4a213
Merge branch 'rnntl-log-probs' into tcpgen
5e34548
changed debug training
f163b84
resolve version issue with pl
baeac25
Merge branch 'tcpgen' of https://github.com/BriansIDP/audio into tcpgen
6157143
updated train.sh and eval.sh scripts
ae1abc0
changed from 1024 bpe to 600 bpe
38506a1
Added biasing option
aca8118
Implemented loss calculation for TCPGen and added biasing option
9195e22
Added biasing option
bb65b04
Making it deterministic as set union is not
f494252
Added biasing option to toggle biasing
a048f4b
rare word f=15
52c1970
Newfiles: global_stats for clean 100 and rnnt decoding prototype with…
abba122
README
6c68eee
Add subset option
e5b2c8e
Added documentation
4a37816
Added prefix-based wordpiece search algorithm, and added documentation
3c534d0
Added documentation
ab2a553
Update README.md
BriansIDP 303977d
Add scoring pipeline
12f6d66
Merge branch 'tcpgen' of https://github.com/BriansIDP/audio into tcpgen
c00678a
Update README.md
BriansIDP 3c57570
Update README.md
BriansIDP a779026
formatting
a009973
removed newline
d7e7172
formatting
e4f8b1c
formatting
e0da63e
formatting
ec150b2
formatting
8f338b8
first commit BrianSun
0e3a652
second commit
b29f05a
add specific paths to cudnn
57e906f
constructing tree done
3c8212b
Add biasing LIBRISPEECH data processor
a76adbb
implemented tree search and DBRNNT training procedure
bd9651b
code for inference with TCPGen
244c58a
resolve version issue with pl
5cb47fc
add fused log smax option
123d386
fixes
f3b3caf
move out log softmax
5076b0a
updated train.sh and eval.sh scripts
2f9e012
changed from 1024 bpe to 600 bpe
9596ba1
Added biasing option
d3d3dcc
Implemented loss calculation for TCPGen and added biasing option
365439a
Added biasing option
f5199f2
Making it deterministic as set union is not
3a1bfe6
Added biasing option to toggle biasing
a4933ef
rare word f=15
36ff039
Newfiles: global_stats for clean 100 and rnnt decoding prototype with…
0583740
README
9f90fa5
Add subset option
86b3611
Added documentation
cb51322
Added prefix-based wordpiece search algorithm, and added documentation
d956261
Added documentation
bb1694e
Add scoring pipeline
882e2e5
Update README.md
BriansIDP 1b1fa0a
Update README.md
BriansIDP 971233d
Update README.md
BriansIDP 78ec4e8
formatting
e8788c7
removed newline
72e090b
formatting
759217d
formatting
89f2a23
formatting
b1cff40
formatting
9c488b4
Merge branch 'tcpgen' of https://github.com/BriansIDP/audio into tcpgen
d9fb233
Upgrade nightly wheels to ROCm5.3 (#2853)
jithunnair-amd 21a36d5
Introduce MUSAN dataset (#2888)
hwangjeff bc9702f
Add additive noise transform (#2889)
hwangjeff 65dfe03
Add feature badges to preemphasis and deemphasis functions (#2892)
hwangjeff d653db6
Add HiFi GAN Generator to prototypes (#2860)
sgrigory f78d6c4
Fix docs warnings for conformer w2v2 (#2900)
cbcde5c
Follow up on WavLM bundles (#2895)
sgrigory 2aa06d3
Toggle on/off ffmpeg test if needed (#2901)
atalman 6149b3a
Fix wrong frame allocation in StreamWriter (#2905)
mthrok c56126a
Fix duplicated memory allocation in StreamWriter (#2906)
mthrok f420995
Update author and maintainer info (#2911)
mthrok b3c6a4d
Fix integration test for WAV2VEC2_ASR_LARGE_LV60K_10M (#2910)
nateanl 98e51d8
Update model documentation structure (#2902)
mthrok 25a6692
Fix type of arguments in torchaudio.io classes (#2913)
nateanl 04aa82e
Update PR labels (#2912)
93e84be
implemented tree search and DBRNNT training procedure
c939856
code for inference with TCPGen
1561f3c
formatting
8b1f4a6
first commit BrianSun
f3b4996
second commit
8f995ec
add specific paths to cudnn
1c446a1
implemented tree search and DBRNNT training procedure
8bc5bc5
resolve version issue with pl
a8c9b3a
add fused log smax option
786e158
fixes
3190574
move out log softmax
d1dc749
formatting
4ad94df
Addressing comments about PR
89f0f8c
Addressed Comments on the PR
199ff86
Update README.md
BriansIDP 3ee4558
Update README.md
BriansIDP 899ada7
Merge remote-tracking branch 'upstream/main' into tcpgen
b8c3bfa
Merge branch 'tcpgen' of https://github.com/BriansIDP/audio into tcpgen
29c6999
change the name of the directory to be more clear
ea02d48
change the name of the directory to be more clear
8b3ea02
Solve bugs due to new updates from the main branch
4e132b7
temporarily commit for debugging
bd9ada9
current train.sh
c239488
Use hptr as input as a default
92a94a4
Addressing nateanl's comments
d12c3f2
changed the name of LIBRISPEECHBIASING to LibriSpeechBiasing
15e1276
removed train.sh and eval.sh files
f4139b1
addressing comments from @nateanl
51d5866
Added comments for arguments
28b2abc
formatted files
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# Contextual Conformer RNN-T with TCPGen Example | ||
|
||
This directory contains sample implementations of training and evaluation pipelines for the Conformer RNN-T model with tree-constrained pointer generator (TCPGen) for contextual biasing, as described in the paper: [Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition](https://ieeexplore.ieee.org/abstract/document/9687915) | ||
|
||
## Setup | ||
### Install PyTorch and TorchAudio nightly or from source | ||
Because Conformer RNN-T is currently a prototype feature, you will need to either use the TorchAudio nightly build or build TorchAudio from source. Note also that GPU support is required for training. | ||
|
||
To install the nightly, follow the directions at <https://pytorch.org/>. | ||
|
||
To build TorchAudio from source, refer to the [contributing guidelines](https://github.com/pytorch/audio/blob/main/CONTRIBUTING.md). | ||
|
||
### Install additional dependencies | ||
```bash | ||
pip install pytorch-lightning sentencepiece | ||
``` | ||
|
||
## Usage | ||
|
||
### Training | ||
|
||
[`train.py`](./train.py) trains an Conformer RNN-T model with TCPGen on LibriSpeech using PyTorch Lightning. Note that the script expects users to have the following: | ||
- Access to GPU nodes for training. | ||
- Full LibriSpeech dataset. | ||
- SentencePiece model to be used to encode targets; the model can be generated using [`train_spm.py`](./train_spm.py). **Note that suffix-based wordpieces are used in this example**. [`run_spm.sh`](./run_spm.sh) will generate 600 suffix-based wordpieces which is used in the paper. | ||
- File (--global_stats_path) that contains training set feature statistics; this file can be generated using [`global_stats.py`](../emformer_rnnt/global_stats.py). The [`global_stats_100.json`](./global_stats_100.json) has been generated for train-clean-100. | ||
- Biasing list: See [`rareword_f15.txt`](./blists/rareword_f15.txt) as an example for the biasing list used for training with clean-100 data. Words appeared less than 15 times were treated as rare words. For inference, [`all_rare_words.txt`](blists/all_rare_words.txt) which is the same list used in [https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias](https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias). | ||
|
||
Additional training options: | ||
- `--droprate` is the drop rate of biasing words appeared in the reference text to avoid over-confidence | ||
- `--maxsize` is the size of the biasing list used for training, which is the sum of biasing words in reference and distractors | ||
|
||
Sample SLURM command: | ||
``` | ||
srun --cpus-per-task=16 --gpus-per-node=1 -N 1 --ntasks-per-node=1 python train.py --exp-dir <Path_to_exp> --librispeech-path <Path_to_librispeech_data> --sp-model-path ./spm_unigram_600_100suffix.model --biasing --biasing-list ./blists/rareword_f15.txt --droprate 0.1 --maxsize 200 --epochs 90 | ||
``` | ||
|
||
### Evaluation | ||
|
||
[`eval.py`](./eval.py) evaluates a trained Conformer RNN-T model with TCPGen on LibriSpeech test-clean. | ||
|
||
Additional decoding options: | ||
|
||
- `--biasing-list` should be [`all_rare_words.txt`](blists/all_rare_words.txt) for Librispeech experiments | ||
- `--droprate` normally should be 0 because we assume the reference biasing words are included | ||
- `--maxsize` is the size of the biasing list used for decoding, where 1000 was used in the paper. | ||
|
||
Sample SLURM command: | ||
``` | ||
srun --cpus-per-task=16 --gpus-per-node=1 -N 1 --ntasks-per-node=1 python eval.py --checkpoint-path <Path_to_model_checkpoint> --librispeech-path <Path_to_librispeech_data> --sp-model-path ./spm_unigram_600_100suffix.model --expdir <Path_to_exp> --use-cuda --biasing --biasing-list ./blists/all_rare_words.txt --droprate 0.0 --maxsize 1000 | ||
``` | ||
|
||
### Scoring | ||
Need to install SCTK, the NIST Scoring Toolkit first following: [https://github.com/usnistgov/SCTK/blob/master/README.md](https://github.com/usnistgov/SCTK/blob/master/README.md) | ||
|
||
Example scoring script using sclite is in [`score.sh`](./score.sh). | ||
|
||
``` | ||
./score.sh <path_to_decoding_dir> | ||
``` | ||
|
||
Note that this will generate a file named `results.wrd.txt` which is in the format that will be used in the following script to calculate rare word error rate. Follow these steps to calculate rare word error rate: | ||
|
||
```bash | ||
cd error_analysis | ||
python get_error_word_count.py <path_to_results.wrd.txt> | ||
``` | ||
|
||
Note that the `word_freq.txt` file contains word frequencies for train-clean-100 only. For the full set it should be calculated again, which will only slightly affect OOV word error rate calculation in this case. | ||
|
||
The table below contains WER results for the test-clean sets using clean-100 training data. R-WER stands for rare word error rate, for words in the biasing list. | ||
|
||
| | WER | R-WER | | ||
|:-------------------:|-------------:|-----------:| | ||
| test-clean | 0.0836 | 0.2366| |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same in here