Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring streaming transducer #78

Merged
merged 11 commits into from
Jul 23, 2022

Conversation

ezerhouni
Copy link
Collaborator

This PR is about refactoring beam search code for offline and online.
The idea is to create one class per beam search type. When we add new beam search (fast_beam_search_nbest, etc), we can inherit the necessary class and write as little code as possible.

@ezerhouni ezerhouni requested review from csukuangfj and pkufool July 22, 2022 12:49
@ezerhouni
Copy link
Collaborator Author

@csukuangfj I started only for streaming rnn-t at the moment. Could you have a quick look and tell me what you think about it. If this refactoring is satisfactory for you, I will do the same for the other models.

@csukuangfj
Copy link
Collaborator

@csukuangfj I started only for streaming rnn-t at the moment. Could you have a quick look and tell me what you think about it. If this refactoring is satisfactory for you, I will do the same for the other models.

Thanks! It looks good to me.

@ezerhouni
Copy link
Collaborator Author

@csukuangfj The PR should be ready to be reviewed. I tested all the streaming models but not the offline yet (will do next week).
I didn't touch offline_asr.py as it seems to be the same as offline_server.py, is there any difference ?

@csukuangfj
Copy link
Collaborator

offline_asr.py does not involve network communication. It's standalone, taking sound files as inputs and outputs text.

@csukuangfj
Copy link
Collaborator

Thanks. I will review it when I come to home(in 30 minutes).

@ezerhouni
Copy link
Collaborator Author

Thanks. I will review it when I come to home(in 30 minutes).

Great ! Thank you ! There is no rush

Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Left some minor comments.

sherpa/bin/pruned_stateless_emformer_rnnt2/beam_search.py Outdated Show resolved Hide resolved
sherpa/bin/pruned_transducer_statelessX/beam_search.py Outdated Show resolved Hide resolved
sherpa/bin/pruned_transducer_statelessX/offline_asr.py Outdated Show resolved Hide resolved
@csukuangfj
Copy link
Collaborator

Sorry, there are conflicts now.

@ezerhouni
Copy link
Collaborator Author

@csukuangfj I will take care of the conflicts, hopefully it won't be too bad.

@csukuangfj
Copy link
Collaborator

I just checked it locally and found that most of the conflicts are about style issues:
Screen Shot 2022-07-23 at 3 21 02 PM

Hope that it will not cause too much trouble for you.

@ezerhouni
Copy link
Collaborator Author

@csukuangfj Just rebased, it was quite straightforward.

@csukuangfj
Copy link
Collaborator

I suggest ignoring errors reported by flake8 in https://github.com/k2-fsa/sherpa/runs/7479612670?check_suite_focus=true

We can edit https://github.com/k2-fsa/sherpa/blob/master/.flake8 to do that. (Either ignore the whole file or ignore
just some specific types of errors.)

You can use
https://github.com/k2-fsa/icefall/blob/master/.flake8
as an example.

if decoding_method == "greedy_search":
nn_and_decoding_func = run_model_and_do_greedy_search
self.beam_search = GreedySearchOffline(
self.model,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI complains
https://github.com/k2-fsa/sherpa/runs/7479632765?check_suite_focus=true

2022-07-23 08:00:57,024 INFO [offline_server.py:258] Using device: cpu
Traceback (most recent call last):
  File "sherpa/bin/pruned_transducer_statelessX/offline_server.py", line 675, in <module>
    main()
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line [28](https://github.com/k2-fsa/sherpa/runs/7479632765?check_suite_focus=true#step:11:29), in decorate_context
    return func(*args, **kwargs)
  File "sherpa/bin/pruned_transducer_statelessX/offline_server.py", line 643, in main
    num_active_paths=num_active_paths,
  File "sherpa/bin/pruned_transducer_statelessX/offline_server.py", line [29](https://github.com/k2-fsa/sherpa/runs/7479632765?check_suite_focus=true#step:11:30)5, in __init__
    self.model,
AttributeError: 'OfflineServer' object has no attribute 'model'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't test this part of the code yet. I need to do some extra testing before moving to merging

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csukuangfj Seems like everything passed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Merging

@csukuangfj csukuangfj added ready and removed ready labels Jul 23, 2022
@csukuangfj csukuangfj added ready and removed ready labels Jul 23, 2022
@csukuangfj csukuangfj merged commit fca8c18 into k2-fsa:master Jul 23, 2022
@ezerhouni ezerhouni changed the title [WIP] Refactoring streaming transducer Refactoring streaming transducer Jul 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants