-
Notifications
You must be signed in to change notification settings - Fork 330
Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 #119
Conversation
29260ac to
2ba572e
Compare
|
@ApostaC @gaocegege @YuhanLiu11 please help me review the first part of the PR. I split into 2 small PRs to make it easier to manage because this feature will require quite a amount of files to be changes if do all-in-one. This would also save time to check the work as well 😃 |
|
@sitloboi2012 Thank Really appreciate your contribution! @YuhanLiu11 @Shaoting-Feng PTAL 🙏 |
|
@sitloboi2012 Thanks for your contribution! Can you fix the pre-commit errors? |
69e88fb to
1261ee4
Compare
|
hi @YuhanLiu11 @Shaoting-Feng, I ran pre-commit again, please help me check this PR |
|
The PR seems to contain a lot of unrelated stuff (the changes in previous PRs). Is it caused by the rebase? @sitloboi2012 Can you try fixing this? Maybe cherry-pick your modifications? Or create a new PR based on the diff with the latest main? |
|
Btw, this is really great work and we appreciate your contribution! Hopefully the PR problem can be fixed easily. |
|
yep should I will cherry picking the stuff out to clean the PR |
1261ee4 to
209dac7
Compare
|
@ApostaC should be good now, no more redundant file changes 😃 |
|
Great! Will take a look soon! Thanks! |
c355195 to
c2033eb
Compare
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using TTFT as decoding length here?
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this name to "Queueing Delay"?
|
hi @ApostaC @YuhanLiu11 , I updated the PR with some update related to the metrics and other minor fixes to align with the latest PR that we currently have on |
Thanks for your great contribution! We will take a look soon. |
Shaoting-Feng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contirbution.
The Prometheus Metrics endpoint in the router looks good to me. However, could you please leave the other parts of the router unchanged? I've noticed many changes, such as comment modifications and variable redefinitions, that seem unrelated to this PR.
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these comments?
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these comments?
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information has already been logged in the two lines below.
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete the conditional statement ?
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep these comments?
src/vllm_router/router.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete this conditional statement?
ea69391 to
4cb025f
Compare
|
hi @ApostaC @YuhanLiu11 @Shaoting-Feng , I updated the code again based on your comment, please help me review it again please, cheers team 😄 |
4cb025f to
77e09e3
Compare
|
hey @sitloboi2012 , thanks again for your great contribution! Can you also provide some examples showing the code can run and produce expected output? |
|
yep sure, let me update the code to include your comment, test case and instruction on how to reproduce the work |
|
Please resolve the conflicts. |
77e09e3 to
feec809
Compare
|
hi @YuhanLiu11, to test this code, you can use the following file that has been available. Currently the work that connecting between router + prometheus + grafana is WIP and this PR only focusing on setting the stuff that will be use in the next step as I mentioned in the description of the PR.
If run successfully, you should have some output like this in your terminal: |
Thanks! Will take a look soon. |
|
hey @sitloboi2012 I'm able to follow the instructions you gave and get the the expected output. Can you fix the pre-commit issues? (sorry for going back-and-forth for the tests..) and then I can merge your PR. Thanks again! |
…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
…YuhanLiu11 Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
…service discovery Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
…ree vLLM metrics, operational metrics, router observe metrics, update requirement.txt for router and tests, update install-minikube-cluster to be more logging info, restart docker service and minikube context after the run Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
…YuhanLiu11 Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
…s.py and request_stats.py, update observe/install.sh to helm update to solve the problem already run helm service same name, update dashboard layout with latest metrics, update request_generator to align with the new endpoint /v1 Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
4f190d3 to
d7b7f4b
Compare
Signed-off-by: sitloboi2012 <huyvo6812@gmail.com>
|
hi @YuhanLiu11 just updated, merged and ran pre-commit for formatting and linting, please help me merge and close this PR 😄 thank you team |
|
The test Functionality test for helm chart / Multiple-Models (pull_request) is flaky. I rerun it several times to pass it. Now I think it is ready to merge. /cc @YuhanLiu11 |
YuhanLiu11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. Thanks!
Feature Router observability (Current QPS, router-side queueing delay, etc) Part 1: #78
Due to the complexity and large task, I will split this feature into 2 sub-small PR for easier manage:
Description
This PR is to create 4 main things for the first part of the Router Observability update:
Update vLLM Dashboard Grafana to align with the reference from @YuhanLiu11 combine with add-on. The metrics include
Update
router.pyto tracking information related to the vLLM Dashboard, add on logging for easier trackingUpdate
README.mdfile to include exposing port for Prometheus dashboard as wellUpdate
utils/install-minikube-cluster.shto include update minikube context, restart docker when setting up the observability, add on logging for easier interpretationUpdate
engine_stats.pyandrequest_stats.pyto align with the metrics informationMinor update:
requirements.txtfortestsandvllm_routerobserve/install.shfromhelm installtohelm upgradeto solve the problem exist namespace when re-installbase_urlintests/perftestto align with latest v1 endpointNext step:
router.pyto calculate some customize metricsREADME.mdfile to explain metrics and dashboard informationBEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
PR Checklist (Click to Expand)
Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]for bug fixes.[CI/Build]for build or continuous integration improvements.[Doc]for documentation fixes and improvements.[Feat]for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]for changes to thevllm_router(e.g., routing algorithm, router observability, etc.).[Misc]for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-committo format your code. SeeREADME.mdfor installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:header which certifies agreement with the terms of the DCO.Using
-swithgit commitwill automatically add this header.What to Expect for the Reviews
We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.