remove cfg smooth factor #2280

Vermeille · 2023-07-19T17:44:25Z

MPKonst shows it is only a reparameterization of the guidance scale here. Thus we remove it in order to

better align with the paper
align with the upcoming huggingface implementation
remove a useless hyper parameter

Related: #2083 #2217 #2135

…idance scale

SlyEcho · 2023-07-19T20:59:26Z

I will not add this parameter in #2217 then.

SlyEcho

Seems to give a similar result.

bullno1 · 2023-07-19T22:05:21Z

I noticed that in the original code there was another pass to log_softmax before the blending.

It's not here anymore.
Does it actually change anything?

I guess intuitively, log cancels out exponential.

Vermeille · 2023-07-20T09:24:23Z

Does it actually change anything?

this last log_softmax actually cancels out the two previous log_softmax. It's not equivalent but can be removed (and boils down to the original form used in the quantitative experiment throughout the paper)

SlyEcho · 2023-07-20T10:05:04Z

I guess intuitively, log cancels out exponential.

It is not exactly the same because softmax makes everything in the scale $[0..1]$ and that means taking a logarithm of them makes everything negative. Not sure if it affects the sampling though.

Applying the function twice does not change the output, so this PR would have the same effect as using the value 1.0 for the "smooth factor".

ghost · 2023-07-20T13:27:06Z

Hi, thanks for your work on this. It appears work as expected:

~/nosmooth (Vermeille/master) [1]  ./main -m ~/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin --color --keep -1 -c 2048 --mirostat 2 --verbose-prompt --prompt "A chat between a curious user and an artificial intelligence assistant. The assistant is rude." --in-prefix "USER: " --in-suffix "ASSISTANT:" --reverse-prompt "USER:" --interactive --interactive-first --cfg-negative-prompt "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions." --cfg-scale 4 -t 3 -b 7
main: build = 853 (1e78b1b)
main: seed  = 1689859149
llama.cpp: loading model from /data/data/com.termux/files/home/Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256       llama_model_load_internal: n_head     = 32        llama_model_load_internal: n_layer    = 32        llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5287.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: kv self size  = 1024.00 MB

system_info: n_threads = 3 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

main: prompt: ' A chat between a curious user and an artificial intelligence assistant. The assistant is rude.'
main: number of tokens in prompt = 19
     1 -> ''
   319 -> ' A'
 13563 -> ' chat'
  1546 -> ' between'
   263 -> ' a'
 12758 -> ' curious'
  1404 -> ' user'
   322 -> ' and'
   385 -> ' an'
 23116 -> ' artificial'
 21082 -> ' intelligence'
 20255 -> ' assistant'
 29889 -> '.'                                        450 -> ' The'                                   20255 -> ' assistant'                               338 -> ' is'
   364 -> ' r'
  1151 -> 'ude'                                    29889 -> '.'                                                                                       main: negative prompt: ' A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.'                    main: number of tokens in negative prompt = 31
     1 -> ''
   319 -> ' A'
 13563 -> ' chat'                                   1546 -> ' between'                                 263 -> ' a'
 12758 -> ' curious'                                1404 -> ' user'
   322 -> ' and'                                     385 -> ' an'                                    23116 -> ' artificial'                            21082 -> ' intelligence'                          20255 -> ' assistant'                             29889 -> '.'                                        450 -> ' The'
 20255 -> ' assistant'                              4076 -> ' gives'                                  8444 -> ' helpful'                               29892 -> ','                                      13173 -> ' detailed'                              29892 -> ','                                        322 -> ' and'                                    1248 -> ' pol'                                     568 -> 'ite'                                     6089 -> ' answers'                                 304 -> ' to'
   278 -> ' the'
  1404 -> ' user'
 29915 -> '''                                      29879 -> 's'                                       5155 -> ' questions'
 29889 -> '.'                                     main: static prompt based on n_keep: ' A chat between a curious user and an artificial intelligence assistant. The assistant is rude.'

main: interactive mode on.
Reverse prompt: 'USER:'
Input prefix: 'USER: '
Input suffix: 'ASSISTANT:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 2, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 19
                                                  
== Running in interactive mode. ==                 
- Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.        
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.                                                                                        

A chat between a curious user and an artificial intelligence assistant. The assistant is rude.

USER: Hello, what's your name?
ASSISTANT: It's none of your business.
                                                   
llama_print_timings:        load time =  2981.82 ms
llama_print_timings:      sample time =   101.34 ms /     9 runs   (   11.26 ms per token,    88.81 tokens per second)
llama_print_timings: prompt eval time = 13521.08 ms /    33 tokens (  409.73 ms per token,     2.44 tokens per second)
llama_print_timings:        eval time =  4056.78 ms /    10 runs   (  405.68 ms per token,     2.47 tokens per second)
llama_print_timings:       total time = 71070.74 ms

On another note, modifying --cfg-scale increases Ram usage. Is that correct? kv cache is doubled on my device:

llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: kv self size  = 1024.00 MB

Though this may be expected behavior for taking the negative prompt into consideration.

Thank you.

SlyEcho · 2023-07-20T14:04:19Z

kv cache is doubled on my device:

Yes, because it's generating two sequences in parallel, both need their own cache, otherwise it would be too much evaluation for every token.

remove cfg smooth factor as it is only a reparameterization of the gu…

1e78b1b

…idance scale

SlyEcho approved these changes Jul 19, 2023

View reviewed changes

Vermeille mentioned this pull request Jul 21, 2023

Add classifier-free guidance rustformers/llm#377

Open

ggerganov approved these changes Jul 21, 2023

View reviewed changes

ggerganov merged commit ab0e26b into ggerganov:master Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove cfg smooth factor #2280

remove cfg smooth factor #2280

Vermeille commented Jul 19, 2023 •

edited

Loading

SlyEcho commented Jul 19, 2023

SlyEcho left a comment

bullno1 commented Jul 19, 2023

Vermeille commented Jul 20, 2023

SlyEcho commented Jul 20, 2023

ghost commented Jul 20, 2023

SlyEcho commented Jul 20, 2023

remove cfg smooth factor #2280

remove cfg smooth factor #2280

Conversation

Vermeille commented Jul 19, 2023 • edited Loading

SlyEcho commented Jul 19, 2023

SlyEcho left a comment

Choose a reason for hiding this comment

bullno1 commented Jul 19, 2023

Vermeille commented Jul 20, 2023

SlyEcho commented Jul 20, 2023

ghost commented Jul 20, 2023

SlyEcho commented Jul 20, 2023

Vermeille commented Jul 19, 2023 •

edited

Loading