-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Open
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (Language Policy).
- Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
no
RAGFlow image version
v0.20.5
Other environment information
8C 32GActual behavior
Hardware Configuration: 8-core CPU, 32GB RAM, single-instance deployment
Stress Test Results:
At 10 QPS, the response of the retrieval interface (api/v1/retrieval) slows down significantly.
Without reranking model: CPU usage reaches 400% ( 4 cores)
With Tongyi Qianwen reranking model: CPU usage reaches 100% ( 1 core)
The hardware resources have not hit the upper limit. When increasing the number of requests, CPU usage stops increasing, but the interface response becomes even slower.
Questions:
1、Is the API limited by concurrent thread count? How to expand the number of concurrent threads?
2、Why does CPU usage become higher when not using the reranking model?
3、How to optimize to reduce CPU usage?
Expected behavior
No response
Steps to reproduce
-Additional information
No response
dosubot
Metadata
Metadata
Assignees
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.