Suggestion for GPT2 decoding speedup #61

chiragjn · 2019-11-14T07:30:46Z

According to transformers documentation, GPT2 LM head supports an argument called past that speeds up decoding by reusing computed attention tensors from previous steps. I have started changes in my fork to be able to get some numbers

chiragjn#2

I am not entirely sure how to get this working for XLNET, but there is a similar, mems argument

Would you accept such PR upstream if it speeds up decoding?

The text was updated successfully, but these errors were encountered:

makcedward · 2019-11-15T04:12:26Z

@chiragjn
It will be good if you can submit PR as I am working on improving performance in terms of speed.

chiragjn · 2019-11-20T05:50:29Z

Closing this as #63 is in master

makcedward added the enhancement New feature or request label Nov 17, 2019

chiragjn mentioned this issue Nov 17, 2019

Use past agrument for GPT2 to speed up decoding #63

Merged

chiragjn closed this as completed Nov 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for GPT2 decoding speedup #61

Suggestion for GPT2 decoding speedup #61

chiragjn commented Nov 14, 2019 •

edited

Loading

makcedward commented Nov 15, 2019

chiragjn commented Nov 20, 2019

Suggestion for GPT2 decoding speedup #61

Suggestion for GPT2 decoding speedup #61

Comments

chiragjn commented Nov 14, 2019 • edited Loading

makcedward commented Nov 15, 2019

chiragjn commented Nov 20, 2019

chiragjn commented Nov 14, 2019 •

edited

Loading