-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Requests & Ideas #8
Comments
I have an idea I just tested, I got indexing time cut in half Before:
After:
Changed: P.S. I reduced my ingestion size as you can see. You would also probably have to load ingestion each time you start the chat, but that was an interesting find. |
That's a well written notice, thanks. Maybe we add memory as the default one with a notice. I'll check it for myself, those numbers look good. I guess there's still huge potential since we are using defualt values besides mmr. We could tweak both the ingestion process and retrieving speed. |
Even if the memory maxs out, it caches it to my SSD anyways, heres another possible hack: I literally use alpaca7b model for ingestion, reading the db, and for the LLM, the exact same file and I dont seem to have an issue. So if we just do ingestion everytime we load the LLM using the memory, I am pretty sure we just need the single model loaded once into memory reducing the loading time between ingestion and questioning the AI. I am going to play around with it to see if I can get something working. |
Read the bottom of this document Almost looks like you can assign the ingestion to a memory location and save that value to reload on the LLM side of things, maybe you could use it to save to storage as persistant so you can check to see if its on storage, if not use the ram version until on storage. I dont know, kind of rambling now. |
This sounds promising. I was asking myself what can be done by playing around with the LlamaCppEmbeddings. Keep me posted A change in models would be the first; then we should tweak the argument |
Ok, please remember you asked for it! ;-) More models:
Document parser:
Database types:
Integration into UI: |
ChatGPT-retrieval-clone:This should be our ultimate goal. With enough tweaking those models should be running with a decent run time. It is possible, therefore also see the new LlamaSharp Repo, that's a set of LlamaCpp in C# with great performance. Model variation:Thanks to @alxspiker in here we are able to convert GGML models to supported GGJI - I tested and uploaded the converted model here
Data handling
UI
|
Is it possible to provide a not so air gapped in exchange of better performance and speed? Also, thanks for your job. ##I'm an Energy Manager, never coded and I'm following your work to maybe launch a Specialized Q&A Bot, so that maybe, maybe call attention of recruiters. |
I'm glad you found joy with this repo :) Certainly if opting for speed is preferred you'd want to call OpenAI's API (or a competing model like MosaicML) itself, stream directly from HuggingFace etc. This job can be done inside a jupyter notebook is basically THE prototype idea of LangChain. Starting point might be this Edit: fixed link |
Idea to create a "Administrative" UI to change parameters, models, stop, clear db etc. And a user interface just for the Q&A/Chat area?
|
Sorry for the slow development. I'm handling exams and a salty girlfriend rn. Back on the desktop soon. |
Quick comment @su77ungr : this "issue" will soon become rather big and hard to synthetize (which is fine as a place for simple discussion), don't forget to open actual issues for each of the ideas you actually want to implement :) Maybe Discussions would be a better place to host this than Issues ? |
Created #76 |
Leave your feature requests here...
The text was updated successfully, but these errors were encountered: