Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow chunking the text file #9

Open
katopz opened this issue May 19, 2024 · 6 comments
Open

Slow chunking the text file #9

katopz opened this issue May 19, 2024 · 6 comments

Comments

@katopz
Copy link

katopz commented May 19, 2024

after try step from readme

curl -X POST http://127.0.0.1:8080/v1/create/rag -F "file=@paris.txt"

It took 590824.84 ms = nearly 1 minute for only chunking 306 lines (91KB) file on m3 max.

This is just me or I miss some flag?

@juntao
Copy link
Collaborator

juntao commented May 20, 2024

Can you perhaps try this?

https://docs.gaianet.ai/creator-guide/knowledge/text

@katopz
Copy link
Author

katopz commented May 20, 2024

Can you perhaps try this?

https://docs.gaianet.ai/creator-guide/knowledge/text

This one took 6.88s seem to be faster.🤔

@juntao
Copy link
Collaborator

juntao commented May 20, 2024

Just make sure that the embedding model you used to generate the vector collection / snapshot is the same as the one rag-api-server starts with.

@katopz
Copy link
Author

katopz commented May 20, 2024

I'm not quite sure which line i've to check, I follow step from readme which is

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-2-7b-chat-hf-Q5_K_M.gguf \
    --nn-preload embedding:GGML:AUTO:all-MiniLM-L6-v2-ggml-model-f16.gguf \
    rag-api-server.wasm \
    --model-name Llama-2-7b-chat-hf-Q5_K_M,all-MiniLM-L6-v2-ggml-model-f16 \
    --ctx-size 4096,384 \
    --prompt-template llama-2-chat \
    --rag-prompt "Use the following pieces of context to answer the user's question.\nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n" \
    --log-prompts \
    --log-stat

and

curl -X POST http://127.0.0.1:8080/v1/create/rag -F "file=@paris.txt"

it should be same?

@juntao
Copy link
Collaborator

juntao commented May 20, 2024

You started the rag api server with all-MiniLM-L6-v2-ggml-model-f16.gguf

So, the command you used to create the embeddings should also be all-MiniLM-L6-v2-ggml-model-f16.gguf

If you just ran the steps in the docs, you should be fine.

@katopz
Copy link
Author

katopz commented May 20, 2024

Yes i just run 100% steps in the docs(for many times by now), but it's still slow.

I think I miss something pretty obvious 🤔.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants