Feature Requests & Ideas #8

su77ungr · 2023-05-11T23:50:56Z

Leave your feature requests here...

alxspiker · 2023-05-12T00:20:17Z

I have an idea I just tested, I got indexing time cut in half

Before:

Starting to index  1  documents @  729  bytes in Qdrant
File ingestion start time: 1683850217.4577804

llama_print_timings:        load time = 12337.31 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time = 12336.71 ms /     6 tokens ( 2056.12 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 12354.11 ms

llama_print_timings:        load time = 12337.31 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time =  3146.90 ms /     6 tokens (  524.48 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time =  3155.95 ms
Time to ingest files: 17.64212942123413 seconds

After:

Starting to index  1  documents @  729  bytes in Qdrant
File ingestion start time: 1683850298.211342

llama_print_timings:        load time =  3763.24 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time =  3762.93 ms /     6 tokens (  627.16 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time =  3770.30 ms

llama_print_timings:        load time =  3763.24 ms
llama_print_timings:      sample time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings: prompt eval time =  2754.55 ms /     6 tokens (  459.09 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time =  2762.24 ms
Time to ingest files: 7.84224534034729 seconds

Changed:
qdrant = Qdrant.from_documents(texts, llama, path="./db", collection_name="test") (Which by the way doesn't use the db_dir variable anyways) to qdrant = Qdrant.from_documents(texts, llama, path=":memory:", collection_name="test") to load it into my memory, and even with maxed out memory on this little laptop, that was a huge difference.

P.S. I reduced my ingestion size as you can see. You would also probably have to load ingestion each time you start the chat, but that was an interesting find.

su77ungr · 2023-05-12T00:29:17Z

That's a well written notice, thanks.
If I recall correctly, the memory storage is destroyed after the session end. With the db we should be able to generate a permament storage (besides currently set up for demos).

Maybe we add memory as the default one with a notice. I'll check it for myself, those numbers look good.

I guess there's still huge potential since we are using defualt values besides mmr. We could tweak both the ingestion process and retrieving speed.

alxspiker · 2023-05-12T00:35:12Z

Oh and I wanted to show you even with maxed out memory, its no problem.

Those results were on this lol

alxspiker · 2023-05-12T00:38:50Z

Even if the memory maxs out, it caches it to my SSD anyways, heres another possible hack:

I literally use alpaca7b model for ingestion, reading the db, and for the LLM, the exact same file and I dont seem to have an issue. So if we just do ingestion everytime we load the LLM using the memory, I am pretty sure we just need the single model loaded once into memory reducing the loading time between ingestion and questioning the AI. I am going to play around with it to see if I can get something working.

alxspiker · 2023-05-12T00:45:57Z

Read the bottom of this document

Almost looks like you can assign the ingestion to a memory location and save that value to reload on the LLM side of things, maybe you could use it to save to storage as persistant so you can check to see if its on storage, if not use the ram version until on storage. I dont know, kind of rambling now.

su77ungr · 2023-05-12T00:47:25Z

This sounds promising. I was asking myself what can be done by playing around with the LlamaCppEmbeddings. Keep me posted

A change in models would be the first; then we should tweak the argument

atxcowboy · 2023-05-12T08:55:14Z

Ok, please remember you asked for it! ;-)
I am personally trying to find something like https://github.com/openai/chatgpt-retrieval-plugin but for self hosted, privacy first, open source solutions, so openai should be out of the picture.

More models:
I would love to see support for models like

Document parser:

Support more document types like pdf, docx and text extraction from slides, html.

Database types:

I am not familiar with qdrant yet, but I know I can use Redis sentinel to scale and there is miles and weaviate. From what I read thus far, Redis has the lowest performance? So I guess it's just a request to have support for multiple db types, of course, the faster the better.

Integration into UI:
I think Oobabooga has the momentum to become the StableDiffusion of generative text, but it has no way to properly finetune at this time. I would love to see an integration into Oobabooga along with API endpoints.

su77ungr · 2023-05-12T14:10:24Z

ChatGPT-retrieval-clone:

This should be our ultimate goal. With enough tweaking those models should be running with a decent run time. It is possible, therefore also see the new LlamaSharp Repo, that's a set of LlamaCpp in C# with great performance.

Model variation:

Thanks to @alxspiker in here we are able to convert GGML models to supported GGJI - I tested and uploaded the converted model here

hosting already converted models onm HuggingFace
create pipeline for an easy-convert

Data handling

PDF is already supported // pipeline to convert docx et. al. to PDF or TXT planned
We might want to head to the qdrant discord to discuss such features

UI

opened new issue for UI
see here

neeewwww · 2023-05-15T17:07:32Z

Is it possible to provide a not so air gapped in exchange of better performance and speed?

Also, thanks for your job. ##I'm an Energy Manager, never coded and I'm following your work to maybe launch a Specialized Q&A Bot, so that maybe, maybe call attention of recruiters.

su77ungr · 2023-05-15T21:30:55Z

I'm glad you found joy with this repo :)

Certainly if opting for speed is preferred you'd want to call OpenAI's API (or a competing model like MosaicML) itself, stream directly from HuggingFace etc.

This job can be done inside a jupyter notebook is basically THE prototype idea of LangChain. Starting point might be this

Edit: fixed link

neeewwww · 2023-05-16T02:22:11Z

Idea to create a "Administrative" UI to change parameters, models, stop, clear db etc. And a user interface just for the Q&A/Chat area?

@su77ungr: the latter can be and are already implemented in the GUI; the hotswap model is a great idea and reminds me of HuggingChat

su77ungr · 2023-05-16T03:18:53Z

Sorry for the slow development. I'm handling exams and a salty girlfriend rn. Back on the desktop soon.

hippalectryon-0 · 2023-05-16T08:53:02Z

Quick comment @su77ungr : this "issue" will soon become rather big and hard to synthetize (which is fine as a place for simple discussion), don't forget to open actual issues for each of the ideas you actually want to implement :)

Maybe Discussions would be a better place to host this than Issues ?

su77ungr · 2023-05-17T14:00:38Z

Created #76

su77ungr added the documentation Improvements or additions to documentation label May 11, 2023

su77ungr pinned this issue May 11, 2023

alxspiker mentioned this issue May 12, 2023

Seriously, convert ggml to ggjt v1 #10

Closed

su77ungr added enhancement New feature or request and removed documentation Improvements or additions to documentation labels May 14, 2023

su77ungr closed this as completed May 17, 2023

su77ungr unpinned this issue May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Requests & Ideas #8

Feature Requests & Ideas #8

su77ungr commented May 11, 2023

alxspiker commented May 12, 2023

su77ungr commented May 12, 2023

alxspiker commented May 12, 2023

alxspiker commented May 12, 2023

alxspiker commented May 12, 2023

su77ungr commented May 12, 2023

atxcowboy commented May 12, 2023 •

edited

Loading

su77ungr commented May 12, 2023

neeewwww commented May 15, 2023

su77ungr commented May 15, 2023 •

edited

Loading

neeewwww commented May 16, 2023 •

edited by su77ungr

Loading

su77ungr commented May 16, 2023 •

edited

Loading

hippalectryon-0 commented May 16, 2023 •

edited

Loading

su77ungr commented May 17, 2023

Feature Requests & Ideas #8

Feature Requests & Ideas #8

Comments

su77ungr commented May 11, 2023

Leave your feature requests here...

alxspiker commented May 12, 2023

su77ungr commented May 12, 2023

alxspiker commented May 12, 2023

alxspiker commented May 12, 2023

alxspiker commented May 12, 2023

su77ungr commented May 12, 2023

atxcowboy commented May 12, 2023 • edited Loading

su77ungr commented May 12, 2023

ChatGPT-retrieval-clone:

Model variation:

Data handling

UI

neeewwww commented May 15, 2023

su77ungr commented May 15, 2023 • edited Loading

neeewwww commented May 16, 2023 • edited by su77ungr Loading

su77ungr commented May 16, 2023 • edited Loading

hippalectryon-0 commented May 16, 2023 • edited Loading

su77ungr commented May 17, 2023

atxcowboy commented May 12, 2023 •

edited

Loading

su77ungr commented May 15, 2023 •

edited

Loading

neeewwww commented May 16, 2023 •

edited by su77ungr

Loading

su77ungr commented May 16, 2023 •

edited

Loading

hippalectryon-0 commented May 16, 2023 •

edited

Loading