TCPGen in Conformer RNN-T #2890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

BriansIDP wants to merge 120 commits into pytorch:main from BriansIDP:tcpgen

Contributor

BriansIDP commented Dec 5, 2022

This pull request contains the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing. An example for Librispeech can be found in audio/examples/asr/librispeech_biasing.

G. Sun and others added 30 commits

August 6, 2022 09:41


          first commit BrianSun

c686251


          second commit

8a6ae5c


          add specific paths to cudnn

b52cda0


          constructing tree done

657d8c6


          Add biasing LIBRISPEECH data processor

4084d5f


          implemented tree search and DBRNNT training procedure

96eeb6f


          add fused log smax option

1a00344


          fixes

b8621e8


          move out log softmax

68664dc


          code for inference with TCPGen

e063b33


          Merge branch 'rnntl-log-probs' into tcpgen

df4a213


          changed debug training

5e34548


          resolve version issue with pl

f163b84


          Merge branch 'tcpgen' of https://github.com/BriansIDP/audio into tcpgen

baeac25


          updated train.sh and eval.sh scripts


          changed from 1024 bpe to 600 bpe

ae1abc0


          Added biasing option

38506a1


          Implemented loss calculation for TCPGen and added biasing option

aca8118


          Added biasing option

9195e22


          Making it deterministic as set union is not

bb65b04


          Added biasing option to toggle biasing

f494252


          rare word f=15

a048f4b


          Newfiles: global_stats for clean 100 and rnnt decoding prototype with…

52c1970

… TCPGen


          README

abba122


          Add subset option

6c68eee


          Added documentation

e5b2c8e


          Added prefix-based wordpiece search algorithm, and added documentation

4a37816


          Added documentation

3c534d0


          Update README.md

ab2a553


          Add scoring pipeline

303977d

G. Sun and others added 5 commits

January 22, 2023 09:46


          current train.sh

bd9ada9


          Use hptr as input as a default

c239488


          Addressing nateanl's comments

92a94a4


          changed the name of LIBRISPEECHBIASING to LibriSpeechBiasing

d12c3f2


          removed train.sh and eval.sh files

15e1276

nateanl reviewed

View reviewed changes

examples/asr/librispeech_conformer_rnnt_biasing/train.py Outdated

+                  )
+                  parser.add_argument(
+                      "--global-stats-path",
+                      default=pathlib.Path("global_stats.json"),

Member

nateanl Feb 3, 2023

The default value here can be global_stats_100.json so that users don't need to type --global-stats-path in slurm script.

examples/asr/librispeech_conformer_rnnt_biasing/score.sh Outdated

		@@ -0,0 +1,2 @@
		dir=experiments/librispeech_clean100_suffix600_tcpgen500_sche30_nodrop/decode_test_clean_b10_KB1000/

Member

nateanl Feb 3, 2023

The path can be changed by input argument like $1. Could you also add a comment on how to use this script?

examples/asr/librispeech_conformer_rnnt_biasing/train.py

+                      )
+                  model = ConformerRNNTModule(str(args.sp_model_path), args.biasing)
+                  data_module = get_data_module(str(args.librispeech_path), str(args.global_stats_path), str(args.sp_model_path),

Member

nateanl Feb 3, 2023

I felt it hard when I want to tune the batch_size or max_token in the dataloader, where the GPU memory is limited in my usecase. @hwangjeff would it be better to provide the api for tuning those?

Contributor

xiaohui-zhang Feb 3, 2023

you mean exposing max_token to outside right? I agree

examples/asr/librispeech_conformer_rnnt_biasing/README.md Outdated

+              Sample SLURM command:
+              ```
+              srun --cpus-per-task=16 --gpus-per-node=1 -N 1 --ntasks-per-node=1 python train.py --exp-dir <Path_to_exp> --librispeech-path <Path_to_librispeech_data> --global-stats-path ./global_stats_100.json --sp-model-path ./spm_unigram_600_100suffix.model --biasing --biasing-list ./blists/rareword_f15.txt --droprate 0.1 --maxsize 200 --epochs 90

Member

nateanl Feb 3, 2023

Could you change spm_unigram_600_100suffix.model to ./spm_unigram_1023.model which is the default output filename by train_spm.py?

Contributor Author

BriansIDP Feb 3, 2023

So the current example is done under 600 wordpiece tokens to replicate what I had in the paper. So maybe we should keep this like it to be consistent with the paper? I will also change that in train_spm.py to be consistent.

Contributor Author

BriansIDP Feb 3, 2023

Otherwise all addressed and pushed. Thank you!

Member

nateanl Feb 3, 2023

cool, thanks!

examples/asr/librispeech_conformer_rnnt_biasing/README.md

+              Sample SLURM command:
+              ```
+              srun --cpus-per-task=16 --gpus-per-node=1 -N 1 --ntasks-per-node=1 python eval.py --checkpoint-path <Path_to_model_checkpoint> --librispeech-path <Path_to_librispeech_data> --sp-model-path ./spm_unigram_600_100suffix.model --expdir <Path_to_exp> --use-cuda --biasing --biasing-list ./blists/all_rare_words.txt --droprate 0.0 --maxsize 1000

Member

nateanl Feb 3, 2023

same in here

examples/asr/librispeech_conformer_rnnt_biasing/eval.py Outdated

+                      help="Run using CUDA.",
+                  )
+                  parser.add_argument(
+                      "--biasinglist",

Member

nateanl Feb 3, 2023

Suggested change

      
                    "--biasinglist",
          
                    "--biasing-list",

examples/asr/librispeech_conformer_rnnt_biasing/eval.py Outdated

Comment on lines 108 to 113

+                  parser.add_argument(
+                      "--biasing",
+                      type=str,
+                      help="Use biasing",
+                      required=True,
+                  )

Member

nateanl Feb 3, 2023

Suggested change

      
                parser.add_argument(
          
                    "--biasing",
          
                    type=str,
          
                    help="Use biasing",
          
                    required=True,
          
                )
          
                parser.add_argument(
          
                    "--biasing",
          
                    action="store_true",
          
                    help="Use biasing",
          
                )

examples/asr/librispeech_conformer_rnnt_biasing/eval.py Outdated



		def run_eval(args):
		usebiasing = True if args.biasing == 'true' else False

Member

nateanl Feb 3, 2023

Suggested change

      
                usebiasing = True if args.biasing == 'true' else False
          
                usebiasing = args.biasing

examples/asr/librispeech_conformer_rnnt_biasing/eval.py Outdated

+                  model = ConformerRNNTModule.load_from_checkpoint(
+                      args.checkpoint_path, sp_model=str(args.sp_model_path), biasing=usebiasing).eval()
+                  data_module = get_data_module(str(args.librispeech_path), str(args.global_stats_path), str(args.sp_model_path),
+                                                biasinglist=args.biasinglist, droprate=args.droprate, maxsize=args.maxsize)

Member

nateanl Feb 3, 2023

Suggested change

      
                                              biasinglist=args.biasinglist, droprate=args.droprate, maxsize=args.maxsize)
          
                                              biasinglist=args.biasing_list, droprate=args.droprate, maxsize=args.maxsize)

examples/asr/librispeech_conformer_rnnt_biasing/eval.py Outdated

+                  )
+                  parser.add_argument(
+                      "--global-stats-path",
+                      default=pathlib.Path("global_stats.json"),

Member

nateanl Feb 3, 2023

Suggested change

      
                    default=pathlib.Path("global_stats.json"),
          
                    default=pathlib.Path("global_stats_100.json"),


          addressing comments from @nateanl

f4139b1

mthrok added a commit to mthrok/audio that referenced this pull request


          Add TCPGen in Conformer RNN-T

80fa0bf

This is the cleaned up version of pytorch#2890

> This pull request contains the implementation of
> the tree-constrained pointer generator (TCPGen) for contextual biasing.
> An example for Librispeech can be found in
> audio/examples/asr/librispeech_biasing.

mthrok added a commit to mthrok/audio that referenced this pull request


          Add TCPGen in Conformer RNN-T

915482a

This is the cleaned up version of pytorch#2890

> This pull request contains the implementation of
> the tree-constrained pointer generator (TCPGen) for contextual biasing.
> An example for Librispeech can be found in
> audio/examples/asr/librispeech_biasing.

Contributor

facebook-github-bot commented Feb 9, 2023

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


          Added comments for arguments

51d5866

Collaborator

mthrok commented Feb 13, 2023

We found a way to land this PR without rebasing, so we are almost good to go. Couple of requests to merge

Please run lint tools
Please run pre-commit and run pre-commit run -a at the root directory.
Please also run flake8.
We cannot check-in the text files in blists directory as they are huge. We can put it in our S3, if the license permits. Can you tell how the files (all_rare_words.txt, rareword_f15.txt and rareword_f30.txt) are obtained?

Thanks,


          formatted files

28b2abc

Contributor Author

BriansIDP commented Feb 13, 2023

Hi @BriansIDP

We found a way to land this PR without rebasing, so we are almost good to go. Couple of requests to merge

Please run lint tools
Please run pre-commit and run pre-commit run -a at the root directory.
Please also run flake8.

We cannot check-in the text files in blists directory as they are huge. We can put it in our S3, if the license permits. Can you tell how the files (all_rare_words.txt, rareword_f15.txt and rareword_f30.txt) are obtained?

Thanks,

Hi @mthrok. Thank you for the instruction. I have now run: (1) pre-commit and then pre-commit run -a and all tests passed. (2) flake8 and did not find errors in any of my modified files.

For the biasing lists, rareword_f15.txt and rareword_f30.txt are generated by thresholding the train-clean-100 set word frequencies at 15 and 30 respectively (so including any words appearing fewer than 15/30 times). The all_rare_words.txt is obtained from here: https://github.com/facebookresearch/fbai-speech/tree/main/is21_deep_bias. Please let me know what I should do with those. Thank you!

Contributor

xiaohui-zhang commented Feb 14, 2023

thanks. @mthrok according to @BriansIDP 's reply, I've confirmed those files only contain word stats of Librispeech data. So it's safe to put on S3. Thanks again.

Contributor

facebook-github-bot commented Feb 14, 2023

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Collaborator

mthrok commented Feb 14, 2023

@xiaohui-zhang @BriansIDP What about error_analysis/word_freq.txt? It seems this is used during the generation of blists. But I cannot quite figure out from what word_freq.txt was generated.

Contributor Author

BriansIDP commented Feb 14, 2023

@xiaohui-zhang @BriansIDP What about error_analysis/word_freq.txt? It seems this is used during the generation of blists. But I cannot quite figure out from what word_freq.txt was generated.

Hi @mthrok , This is a word count file counting all training set word frequencies in train_clean_100. This file is actually only needed when calculating OOV word error rates (since anything not in this file should be counted as OOV words). Should I add a line explaining what this is in get_error_word_count.py and then you can move it if needed?

Contributor

xiaohui-zhang commented Feb 21, 2023

In this case I think it's OK to simply keep this file in S3 as well. cc @mthrok

facebook-github-bot closed this in

1ed330b

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Feb 23, 2023

@mthrok merged this pull request in 1ed330b.

github-actions bot commented Feb 23, 2023

Hey @mthrok.
You merged this PR, but labels were not properly added. Please add a primary and secondary label (See https://github.com/pytorch/audio/blob/main/.github/process_commit.py)

Collaborator

mthrok commented Feb 23, 2023

@BriansIDP I merged this PR in 1ed330b. Thank you for the contribution and congrats!

Regarding the txt files; I removed rare words, and uploaded them to torchaudio's CDN.

1ed330b#diff-a464de23e8e5d28a210663f87eff1a7fb55b6fcfcb6df611ef5013515b59c554

I did not add error_analysis/word_freq.txt to the CDN, as I was not sure if this should be accessible. Let me know if it is better mentioned in README.

mthrok added example recipe labels

Contributor Author

BriansIDP commented Feb 23, 2023

@BriansIDP I merged this PR in 1ed330b. Thank you for the contribution and congrats!

Regarding the txt files; I removed rare words, and uploaded them to torchaudio's CDN.

1ed330b#diff-a464de23e8e5d28a210663f87eff1a7fb55b6fcfcb6df611ef5013515b59c554

I did not add error_analysis/word_freq.txt to the CDN, as I was not sure if this should be accessible. Let me know if it is better mentioned in README.

Thank you @mthrok so much for helping me throughout this PR!

I learned a lot! It would be better mentioning this file is only to keep track of the training set vocabulary to calculate OOV word error rates in error_analysis/get_error_word_count.py. Thank you!

Contributor Author

BriansIDP commented Mar 1, 2023

Hi @mthrok. I am planning to write a tutorial about the new biasing module, and I wonder if it is possible to upload my biasing model to CDN so that I can load it in the tutorial (maybe name it https://download.pytorch.org/torchaudio/models/conformer_rnnt_biasing_librispeech.pt). If possible, what would be the best way for me to send the model to you (e.g. I can do this via Google drive)? Thank you so much for your help!

Collaborator

mthrok commented Mar 2, 2023

Google drive works for us. Make it public or share it with moto@meta.com.
Can you confirm that the model is trained on publicly available dataset? (i.e. LibriSpeech?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

nateanl nateanl left review comments

kobenaxie kobenaxie left review comments

mthrok Awaiting requested review from mthrok mthrok is a code owner

carolineechen Awaiting requested review from carolineechen

hwangjeff Awaiting requested review from hwangjeff hwangjeff is a code owner

xiaohui-zhang Awaiting requested review from xiaohui-zhang

Labels

CLA Signed example Merged recipe