Open
Description
Due diligence
- I have done my due diligence in trying to find the answer myself.
Topic
The PyTorch implementation
Question
Hello! Im looking at the implementation and one thing I'm confused about is why the streaming offset is tracked with 2 variables offset: Tensor and offset_cpu: int. https://github.com/kyutai-labs/moshi/blob/main/moshi/moshi/modules/transformer.py#L287
Is the tensor one sent to gpu and the cpu one is an int so it doesn't get caught by something like model.cuda(), and then you track both so you dont have to sync between cpu and gpu just for the offset?
Thanks for your time.