Use past
agrument for GPT2 to speed up decoding
#63
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As suggested in #61, I made changes to use the
past
argument for GPT2. I have not however made the equivalent changes for xlnet asmems
behaves differently and under-documented for the moment.I did some benchmarking on a 2 core CPU server with the following code
Since there is randomness involved in decoding, I also made a separate branch on top of these changes to use greedy decoding instead.
chiragjn@5a5a848
All numbers here: https://docs.google.com/spreadsheets/d/178VnHeBpHWz5lKHLbBuYTRPqxiWXa7-rpCje2wQH_i8/edit?usp=sharing
I leave the API design decisions up to you. Let me know how can I improve this pull request