Could you please tell me which paper/article this multi-head VQ implementation is based on?
Could you please tell me which paper/article this multi-head VQ implementation is based on?