You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to transformers documentation, GPT2 LM head supports an argument called past that speeds up decoding by reusing computed attention tensors from previous steps. I have started changes in my fork to be able to get some numbers
According to transformers documentation, GPT2 LM head supports an argument called
past
that speeds up decoding by reusing computed attention tensors from previous steps. I have started changes in my fork to be able to get some numberschiragjn#2
I am not entirely sure how to get this working for XLNET, but there is a similar,
mems
argumentWould you accept such PR upstream if it speeds up decoding?
The text was updated successfully, but these errors were encountered: