Update dynamic-lora-sidecar to expose metrics to track loaded adapters #980

shotarok · 2025-06-13T05:05:06Z

Resolves #600

Update dynamic-lora-sidecar to expose metrics to track loaded adapters. lora_syncer_adapter_status is a binary metric with the adapter_name label. I confirmed sidecar.py ran an HTTP server with the port 8080 that returned metrics.

curl http://localhost:8080/metrics

Run sidecar.py

(venv) laborant@dev-machine:sidecar$ pwd
/home/laborant/gateway-api-inference-extension/tools/dynamic-lora-sidecar/sidecar
(venv) laborant@dev-machine:sidecar$ uv run --with-requirements ../requirements.txt python sidecar.py --config ../configmap.yaml
2025-06-13 04:59:09 - INFO - sidecar.py:118 -  Settings initialized: health check timeout=300s, interval=2s, reconcile trigger=5s
2025-06-13 04:59:09 - INFO - sidecar.py:332 -  Starting metrics server on port 8080
2025-06-13 04:59:09 - INFO - sidecar.py:335 -  Running initial reconcile for config map ../configmap.yaml
2025-06-13 04:59:09 - INFO - sidecar.py:281 -  reconciling model server localhost:8000 with config stored at ../configmap.yaml

Run curl

laborant@dev-machine:~$ curl http://localhost:8080/metrics
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 1473.0
python_gc_objects_collected_total{generation="1"} 168.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 76.0
python_gc_collections_total{generation="1"} 6.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="12",patchlevel="3",version="3.12.3"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.99852032e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.1213952e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.74979074849e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.5700000000000001
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 9.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024.0
# HELP lora_syncer_adapter_status Status of LoRA adapters (1=loaded, 0=not_loaded)
# TYPE lora_syncer_adapter_status gauge

pytest

(venv) laborant@dev-machine:sidecar$ uv run --with-requirements ../requirements.txt --with pytest pytest .
==================================================================== test session starts =====================================================================
platform linux -- Python 3.12.3, pytest-8.4.0, pluggy-1.6.0
rootdir: /home/laborant/gateway-api-inference-extension/tools/dynamic-lora-sidecar/sidecar
plugins: anyio-4.9.0
collected 6 items                                                                                                                                            

test_sidecar.py ......                                                                                                                                 [100%]

===================================================================== 6 passed in 0.26s ======================================================================

netlify · 2025-06-13T05:05:11Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`7269f88`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68509a0446018a0008443023
😎 Deploy Preview	https://deploy-preview-980--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-06-13T05:05:15Z

Hi @shotarok. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

danehans · 2025-06-13T16:08:12Z

@shotarok thanks for the PR. If the lora syncer metrics were exposed on port 9090, could they be remotely scraped by following the metrics guide? Either way, this guide should be updated with a new LoRA Syncer section that provides details similar to the existing EPP metrics.

As a follow on, e2e tests should be updated to include test cases for scraping EPP and lora syncer metrics with assertions for the expected metrics. Feel free to create an issue and assign yourself if you can help with this.

shotarok · 2025-06-14T03:07:32Z

@danehans Thank you for your feedback! If we use the same port in both main and sidecar containers, I think scraping of the /metrics endpoint won't work consistently. I'll update the docs in this PR to add the LoRA Syncer section.

As for the e2e tests, until the vLLM simulator supports the following endpoints to load/unload a LoRA adapter, we can't write an e2e test for LoRA Syncer metrics with the vLLM simulator. I think we can update an e2e test for EPP even now, so I'll create separate issues as a follow-up.

/v1/unload_lora_adapter
/v1/load_lora_adapter

shotarok · 2025-06-14T04:23:40Z

@danehans I updated the documentation about the metrics and created two issues for the e2e tests. I found hermetic_test for EPP's metrics. If the e2e test feels unnecessary for EPP's metrics, please feel free to close that issue. Thanks!

danehans · 2025-06-16T20:31:34Z

xref: llm-d/llm-d-inference-sim#58 for lora api endpoint support.

site-src/guides/metrics.md

danehans · 2025-06-16T20:40:22Z

A few nits that are non-blocking. @JeffLuoo PTAL since you have experience with GIE metrics.

danehans · 2025-06-16T20:40:33Z

/lgtm

kfswain · 2025-06-16T20:50:47Z

/ok-to-test

JeffLuoo · 2025-06-17T13:36:14Z

/lgtm

danehans · 2025-06-17T15:08:12Z

/approve

k8s-ci-robot · 2025-06-17T15:08:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, shotarok

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [danehans]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubernetes-sigs#980) * Add a metrics to track loaded adapters * Update the sample manifests * Add explanation of metrics from dyanmic LoRA adapter sidecar * Add explanation of metrics from dyanmic LoRA adapter sidecar (take 2) * Update metrics.md based on feedback

…e it easier to add plugins (#881) * configuration implementation (after rebase...) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Moved plugin registry back to pkg/epp/plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Removed unneeded 'forced imports' of scorers Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Changed 'profilepicker' to 'profilehandler' in new and old code Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Pass the configured SchedulingProfiles to LoadSchedulerConfig Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Ensure that both the configText and configFile flags are not specified Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Load RequestControl plugins from the configuration Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Register all plugin factories Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Review fixes Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Reverted unneeded change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates from review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a stub interface for plugins to get data from the EPP Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a temporary implementation of plugins.Handle Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added pluginName and plugins.Handle to plugin factory interface Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated plugin factory signatures to reflect new API Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated plugin instantiation to reflect new API Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated plugin instantiation to reflect new API Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated tests to reflect new API Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Do not rename the imported package Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Only upper layer of code should log errors Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Only pass what is needed to instantiate the plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Review updates Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Review update Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Review update. Make more clear that the code only checks for already defined names Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * fixed e2e doc in makefile (does not require GPUs) (#976) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * API: Adds 5xx Status Code for Invalid ExtRef (#991) Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> * feat(conformance): Add test for invalid EPP service reference (#959) * fix boilerplate header * add tests for InferencePoolInvalidEPPService * change to expect error on httproute refcond * moved the creation of the context to main.go. (#995) this is useful when writing a different main like llm-d, allowing to propogate the same context to the whole system. Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * fix dead links (#989) * feat: add health check for epp cluster (#966) * feat: add health check for epp cluster Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> * remove tls Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> * don't use tls Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> * health checking flag Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> * fix import Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> * add tls options Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> --------- Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> * Server unit test and utility to help with such tests (#820) Signed-off-by: Ira <IRAR@il.ibm.com> * Update dynamic-lora-sidecar to expose metrics to track loaded adapters (#980) * Add a metrics to track loaded adapters * Update the sample manifests * Add explanation of metrics from dyanmic LoRA adapter sidecar * Add explanation of metrics from dyanmic LoRA adapter sidecar (take 2) * Update metrics.md based on feedback * refactor: Replace prefix cache structure with golang-lru (#928) * refactor: Replace prefix cache structure with golang-lru Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> * fix: rename prefix scorer parameters and convert test to benchmark test Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> * feat: Add per server LRU capacity Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> * fix: Fix typos and error handle Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> * fix: add safety check for LRUCapacityPerServer Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> --------- Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test (#834) * copy of accepted inference pool test to start from. * add yaml file for the test * update time out * update the yaml file to add port 9002 * read timeout config from local repo * remove excess comments * correct spelling for scenarios * check route condition on RouteConditionResolvedRefs * remove empty lines in yaml * set optional/defaulted fields as unspecified * fix timeout * fix boilerplate header * change varialbe names to use primary secondary consistently. * remove extra comments * factor out common code * Add actual http traffic validation using echo-basic * remove extra comments from manifest * remove modifiedTimeoutConfig.HTTPRouteMustHaveCondition per review comment. * intermediate update * fix the test run * factor out common code * move epp def to shared manifest * remove extra comments * revert back to two epps * add to do for epp image * switch to GeneralMustHaveConditionTimeout * undo gateway version changes * remove unused HTTPRouteMustHaveConditions * update doc string for GetPod * update docstring * Remove resource type from names in manifests. * remove type from name * remove health check * add todo for combining getpod methods * configuration implementation (after rebase...) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * After review, made code more obvious Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Fixed merge issues Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> Signed-off-by: zhengkezhou1 <madzhou1@gmail.com> Signed-off-by: Ira <IRAR@il.ibm.com> Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Daneyon Hansen <daneyon.hansen@solo.io> Co-authored-by: sina chavoshi <chavoshi@google.com> Co-authored-by: Xudong Wang <68834160+caozhuozi@users.noreply.github.com> Co-authored-by: Zhengke Zhou <madzhou1@gmail.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: Shotaro Kohama <khmshtr28@gmail.com> Co-authored-by: Kfir Toledo <kfir.toledo@gmail.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com>

shotarok added 2 commits June 12, 2025 21:57

Add a metrics to track loaded adapters

c968003

Update the sample manifests

ca3b69e

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 13, 2025

k8s-ci-robot requested review from danehans and robscott June 13, 2025 05:05

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 13, 2025

Add explanation of metrics from dyanmic LoRA adapter sidecar

61cfd9b

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 14, 2025

Add explanation of metrics from dyanmic LoRA adapter sidecar (take 2)

2860b20

This was referenced Jun 14, 2025

Add e2e test for EPP metrics scraping #985

Closed

Add e2e test for Dynamic LoRA Adapter Sidecar metrics scraping #986

Open

danehans reviewed Jun 16, 2025

View reviewed changes

site-src/guides/metrics.md Outdated Show resolved Hide resolved

danehans reviewed Jun 16, 2025

View reviewed changes

site-src/guides/metrics.md Outdated Show resolved Hide resolved

k8s-ci-robot assigned danehans Jun 16, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 16, 2025

Update metrics.md based on feedback

7269f88

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025

shotarok requested a review from danehans June 16, 2025 23:00

shotarok mentioned this pull request Jun 17, 2025

Add LoRA Load and Unload APIs llm-d/llm-d-inference-sim#58

Closed

k8s-ci-robot assigned JeffLuoo Jun 17, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2025

k8s-ci-robot merged commit 17824ba into kubernetes-sigs:main Jun 17, 2025
9 checks passed

shotarok deleted the shotarok/lora-syncer-metrics branch June 17, 2025 22:41

Update dynamic-lora-sidecar to expose metrics to track loaded adapters #980

Update dynamic-lora-sidecar to expose metrics to track loaded adapters #980

Uh oh!

Conversation

shotarok commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jun 13, 2025

Uh oh!

danehans commented Jun 13, 2025

Uh oh!

shotarok commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shotarok commented Jun 14, 2025

Uh oh!

danehans commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

danehans commented Jun 16, 2025

Uh oh!

danehans commented Jun 16, 2025

Uh oh!

kfswain commented Jun 16, 2025

Uh oh!

JeffLuoo commented Jun 17, 2025

Uh oh!

danehans commented Jun 17, 2025

Uh oh!

k8s-ci-robot commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

shotarok commented Jun 13, 2025 •

edited

Loading

netlify bot commented Jun 13, 2025 •

edited

Loading

shotarok commented Jun 14, 2025 •

edited

Loading