gpt4all-backend: Fix MPT buffer use, deduplicate sampling and tokenizing #589

apage43 · 2023-05-16T00:57:52Z

Removes non-threadsafe use of static inference buffer in mpt code and actually uses model struct buffer
Deduplicates tokenizing and sampling code in gptj and mpt models

more unifying mpt and gptj code - this one's never written so also changing the name to be clearer

kuvaus · 2023-05-16T08:15:31Z

This looks awesome! I really like the simplification.

One question:

Since you're changing how the buf_size works, does this comment take into account the progress in ggerganov/ggml#145 in the past 2 days? Right now if you use a long prompt, say 1.5k characters, with MPT and put batch_size to 20 or 56 in GPT4All-chat it will crash the GUI on memory error. The fixes work with relatively low batch sizes but if this PR works on arbitrary batch_size that would be great.

You likely already thought of this since you resize model.buf.resize(buf_size); so maybe it just works with arbitrary large n_predict and batch_size?

apage43 · 2023-05-16T13:04:04Z

Haven't gotten crashes and have put enough text through it in the UI to get the "recalculating context" pause.

Though one of the reasons to use MPT is that it should still kind of work when you override n_ctx to be a bigger size than it was trained on and I think its still possibly an underestimate for that case - there's not a way to do that in the UI or llmodel yet and the memory estimates for all the models would probably be changed to support it - or possibly just not allow it on llama and gptj, they're not likely to work that well at longer context sizes.

kuvaus · 2023-05-16T13:30:28Z

Haven't gotten crashes and have put enough text through it in the UI to get the "recalculating context" pause.

Great! That's all I wanted to hear. :)

I didn't know about the extrapolation to long context lengths. Sounds interesting, looks like its worth doing at some point in the future. And yeah, probably easiest to allow it for MPT only first.

apage43 changed the title ~~Fix MPT buffer use, deduplicate backend code~~ gpt4all-backend: Fix MPT buffer use, deduplicate sampling and tokenizing May 16, 2023

apage43 added 4 commits May 15, 2023 18:05

mpt: use buf in model struct (thread safety)

9aa1a18

backend: make initial buf_size const in model impls

92d33f6

more unifying mpt and gptj code - this one's never written so also changing the name to be clearer

backend: dedupe tokenizing code in gptj/mpt

cd87a1b

backend: dedupe tokenizing code in mpt/gptj

ff39099

apage43 force-pushed the backendrw branch from 82a4993 to ff39099 Compare May 16, 2023 01:06

manyoso approved these changes May 16, 2023

View reviewed changes

manyoso merged commit d14936b into nomic-ai:main May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt4all-backend: Fix MPT buffer use, deduplicate sampling and tokenizing #589

gpt4all-backend: Fix MPT buffer use, deduplicate sampling and tokenizing #589

apage43 commented May 16, 2023

kuvaus commented May 16, 2023

apage43 commented May 16, 2023

kuvaus commented May 16, 2023

gpt4all-backend: Fix MPT buffer use, deduplicate sampling and tokenizing #589

gpt4all-backend: Fix MPT buffer use, deduplicate sampling and tokenizing #589

Conversation

apage43 commented May 16, 2023

kuvaus commented May 16, 2023

apage43 commented May 16, 2023

kuvaus commented May 16, 2023