-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove cfg smooth factor #2280
remove cfg smooth factor #2280
Conversation
I will not add this parameter in #2217 then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to give a similar result.
I noticed that in the original code there was another pass to log_softmax before the blending. It's not here anymore. I guess intuitively, log cancels out exponential. |
this last log_softmax actually cancels out the two previous log_softmax. It's not equivalent but can be removed (and boils down to the original form used in the quantitative experiment throughout the paper) |
It is not exactly the same because softmax makes everything in the scale Applying the function twice does not change the output, so this PR would have the same effect as using the value 1.0 for the "smooth factor". |
Hi, thanks for your work on this. It appears work as expected:
On another note, modifying --cfg-scale increases Ram usage. Is that correct? kv cache is doubled on my device:
Though this may be expected behavior for taking the negative prompt into consideration. Thank you. |
Yes, because it's generating two sequences in parallel, both need their own cache, otherwise it would be too much evaluation for every token. |
MPKonst shows it is only a reparameterization of the guidance scale here. Thus we remove it in order to
Related: #2083 #2217 #2135