You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.
The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?
The text was updated successfully, but these errors were encountered:
Thanks for raising the issue. This is an important question we're trying to address these days: how to allow for more flexible state spaces, including graphs for instance.
As of now, states need to be represented as tensors, so the natural way would be to consider long tensors that contain all information you need to transition from a state to another. In this case, maybe you can use some dimensions of the state to store the key-value cache, and some dimensions to store the decoded token indices
Hello,
Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.
The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?
The text was updated successfully, but these errors were encountered: