Performance of num_head=1?

Thank you for your excellent work. I noticed that your codebook head number is 4, which means downstream generation tasks need to output 4 tokens at once, potentially making training more challenging. I would like to know how much the performance would differ if the head number is 1.
Looking forward to your response.