Staged Encoder (E) & PD Timing for zmq scheme #94

LJH-LBJ · 2025-10-22T03:59:30Z

Purpose

对EPD的执行时间计时

统计时间分成6大部分：

vllm_proxy_transfer_to_pd_seconds：proxy转发消息到pd收到的时间
vllm_proxy_transfer_to_encode_seconds：proxy转发消息到encoder收到的时间
vllm_Encoder_request_queue_time_seconds：请求在encoder排队的时间
vllm_PD_request_queue_time_seconds：请求在PD排队的时间
vllm_execute_mm_encoder_seconds：执行encoder计算的时间（从需要计算Encoder直到得到Encoder输出的时间）
vllm_Encoder_cache_trans_seconds：Encoder的cache从存储到获取的时间
vllm_prefill_forward_seconds：prefill执行的时间

记录方法：

vllm_proxy_transfer_to_pd_seconds： logger输出Proxy sending prefill request和 generation received proxy request
vllm_proxy_transfer_to_encode_seconds： logger输出Proxy sending encode request和encode received proxy request
vllm_Encoder_request_queue_time_seconds和vllm_PD_request_queue_time_seconds在vllm用已有指标vllm:request_queue_time_seconds代替
vllm_execute_mm_encoder_seconds：添加metrics指标vllm:request_encoder_consume_time
vllm_Encoder_cache_trans_seconds：logger输出Save cache successful for和Success load encoder cache for request_id
vllm_prefill_forward_seconds在vllm用已有指标vllm:request_prefill_time_seconds代替

修改：
1.添加性能指标
2.在proxy和worker中添加了/metrics接口
a. 自动搜索可用端口并输出端口

开关：TIMECOUNT_ENABLED=1 or 0

端口会在拉起实例时输出(http://127.0.0.1:XXXX)
获取metrics指标
curl http://127.0.0.1:XXXX/metrics

Test Plan

Test Result

INFO 10-25 17:15:10 [loggers.py:118] Engine 000: Avg prompt throughput: 1667.6 tokens/s, Avg generation throughput: 1.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 100.0%, Prefix cache hit rate: 0.0%
INFO 10-25 17:15:10 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}
INFO 10-25 17:15:20 [loggers.py:118] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 100.0%, Prefix cache hit rate: 0.0%
INFO 10-25 17:15:20 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}
INFO 10-25 17:15:30 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}
INFO 10-25 17:15:40 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}
INFO 10-25 17:15:50 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}
INFO 10-25 17:16:00 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}
INFO 10-25 17:16:10 [disagg_worker.py:78] DisaggWorker metrics:{'vllm:request_queue_time_seconds|engine=0': {'count': 10.0, 'mean': 0.5825753671117126}, 'vllm:request_prefill_time_seconds|engine=0': {'count': 10.0, 'mean': 0.2375835937447846}, 'vllm:e2e_request_latency_seconds|engine=0': {'count': 10.0, 'mean': 2.087323236465454}, 'vllm:request_encoder_consume_time_seconds|engine=0': {'count': 10.0, 'mean': 0.23731594048440458}}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Junhong <liujunhong11@huawei.com>

github-actions · 2025-10-22T03:59:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

本次PR为EPD（编码器、预填充、解码）执行路径增加了计时和指标。这些更改分布在多个文件中，从示例代码到核心调度器和指标系统。实现看起来基本正确，并遵循了PR描述。然而，我发现了两处代码重复，应予以解决以提高代码质量和可维护性。一处在示例文件中，另一处在核心调度器逻辑中。

gemini-code-assist · 2025-10-22T04:00:37Z

vllm/v1/core/sched/scheduler.py

+                    if self.log_stats and TIMECOUNT_ENABLED and\
+                        request.request_id not in self._epd_encoder_reqs:
+                        # Record EPD encoder request
+                        self._epd_encoder_reqs.add(request.request_id)
+                        request.record_event(
+                            EngineCoreEventType.ENCODER_CONSUME_START)


这段记录 ENCODER_CONSUME_START 事件的代码与本文件中第 324-329 行的代码完全相同。为了提高代码的可维护性并避免重复，建议将此逻辑提取到一个单独的私有方法中，然后在两个地方调用该方法。
例如，您可以创建一个像这样的辅助方法：

def _record_encoder_start_event(self, request: Request): if self.log_stats and TIMECOUNT_ENABLED and request.request_id not in self._epd_encoder_reqs: # Record EPD encoder request self._epd_encoder_reqs.add(request.request_id) request.record_event(EngineCoreEventType.ENCODER_CONSUME_START)

然后在两个位置调用 self._record_encoder_start_event(request)。

Signed-off-by: Junhong <liujunhong11@huawei.com>

wuhang2014 · 2025-10-25T10:04:15Z

vllm/v1/metrics/loggers.py

            self.histogram_max_tokens_request.observe(
                finished_request.max_tokens_param)
+            self.histogram_encoder_consume_seconds.observe(
+                finished_request.encoder_consume_time)


class LoggingStatLogger的同样函数也加一下处理，然后就可以在LoggingStatLogger的log方法里面打出来，不需要再解析prometheus的字段了

amy-why-3459 · 2025-10-25T12:57:20Z

vllm/distributed/ec_transfer/ec_connector/shared_storage_connector.py

                if mm_data.request_id not in encoder_cache:
                    encoder_cache[mm_data.request_id] = {}
                encoder_cache[mm_data.request_id][input_id] = ec_cache
+                if TIMECOUNT_ENABLED:


按照request的粒度打印，不要按照input_id的粒度打印

Signed-off-by: Junhong <liujunhong11@huawei.com>

Junhong added 9 commits October 16, 2025 20:51

add /metrics for worker and proxy

64aa482

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

e288c5c

Signed-off-by: Junhong <liujunhong11@huawei.com>

add logger and encoder consume time

4e61246

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

e2bebcc

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

b14486d

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

44b7b64

Signed-off-by: Junhong <liujunhong11@huawei.com>

TIMECONUT_ENABLED to TIMECOUNT_ENABLED

4593f32

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

bd02581

Signed-off-by: Junhong <liujunhong11@huawei.com>

Merge remote-tracking branch 'vllm-lhs/v0.9.1' into EPD_zmq_timecount

d17efa0

LJH-LBJ added this to the EPD-main (1030) milestone Oct 22, 2025

LJH-LBJ self-assigned this Oct 22, 2025

LJH-LBJ added the enhancement New feature or request label Oct 22, 2025

LJH-LBJ linked an issue Oct 22, 2025 that may be closed by this pull request

[Feature]: TTFT : Staged Encoder (E) & PD Timing with Socket Reporting and Proxy Aggregation #56

Open

1 task

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

Junhong added 12 commits October 22, 2025 16:19

opt

92ec76e

Signed-off-by: Junhong <liujunhong11@huawei.com>

add PrometheusStatLogger and LoggingStatLogger print

72be377

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

b610e84

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

550568c

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

9946a17

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

1422658

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

45fe58f

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

772a7b4

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

beda3d1

Signed-off-by: Junhong <liujunhong11@huawei.com>

convert to milliseconds

26b610a

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

f39cb92

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

04a7c50

Signed-off-by: Junhong <liujunhong11@huawei.com>

wuhang2014 reviewed Oct 25, 2025

View reviewed changes

amy-why-3459 reviewed Oct 25, 2025

View reviewed changes

opt

4648b91

Signed-off-by: Junhong <liujunhong11@huawei.com>

Junhong added 8 commits October 27, 2025 15:14

opt

80634c5

Signed-off-by: Junhong <liujunhong11@huawei.com>

fix _force_log

daee809

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

6ff80e0

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

6555758

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

654d0b8

Signed-off-by: Junhong <liujunhong11@huawei.com>

fix bug

e8cd21a

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

24aedd8

Signed-off-by: Junhong <liujunhong11@huawei.com>

opt

165285c

Signed-off-by: Junhong <liujunhong11@huawei.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Staged Encoder (E) & PD Timing for zmq scheme #94

Staged Encoder (E) & PD Timing for zmq scheme #94

Uh oh!

LJH-LBJ commented Oct 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 22, 2025

Uh oh!

wuhang2014 Oct 25, 2025

Uh oh!

LJH-LBJ Oct 28, 2025

Uh oh!

amy-why-3459 Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Staged Encoder (E) & PD Timing for zmq scheme #94

Are you sure you want to change the base?

Staged Encoder (E) & PD Timing for zmq scheme #94

Uh oh!

Conversation

LJH-LBJ commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

wuhang2014 Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

LJH-LBJ Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

amy-why-3459 Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LJH-LBJ commented Oct 22, 2025 •

edited by github-actions bot

Loading