Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why set "seq_length = min(args.generate_num, 256)" #68

Open
luweishuang opened this issue Jan 21, 2020 · 1 comment
Open

why set "seq_length = min(args.generate_num, 256)" #68

luweishuang opened this issue Jan 21, 2020 · 1 comment

Comments

@luweishuang
Copy link

I notice you have three pretrained models, include seqlen256_v1.ckpt and seqlen512_v1.ckpt. And you say "Only difference is the sequence length used during training. The 512 model uses double the number of tokens as the 256 one for computing the attention but half the batch size (to prevent OOM)." so why in generate.py you set seq_length = min(args.generate_num, 256)?
If I used seqlen512_v1.ckpt model, should I set seq_length = min(args.generate_num, 512)?

@ThursdaysMan
Copy link

Hi @luweishuang,
Not a developer, but I encountered the exact same issue with training.
You will have to change the value to 512 if training on the seqlen512_v1.ckpt model. I have modified this on the version I'm currently using to a function argument inputted at the time of cmd line entry so I would strongly recommend doing this.
All the best
Thursday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants