Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

[RFC]: Metrics Refactoring #1492

@lxning

Description

@lxning

Current TorchServe has two mechanisms to emit metrics.

  1. Emit metrics to logs files in a StatsD like format by default .

In this case, both frontend and backend metrics are recorded in log file. However, the logs format is not standard StatsD format. They miss the metric type information (ie. counter, gauge, timer and so on). Users have to write regex to parse the log to build dashboard.

  1. Emit Prometheus formatted metrics.

In this case, existing TorchServe only emits 3 metrics.

  • ts_inference_requests_total
  • ts_inference_latency_microseconds
  • ts_queue_latency_microseconds

Users are not able to get model metrics and system metrics via metrics endpoint.

No central place to store Metrics definition

Existing TorchServe metrics definitions spread everywhere. It is difficult for users to know the available metrics.

Re-Design

TS_Metrics_Design.pdf

Sub tasks on frontend side

### Tasks
- [ ] https://github.com/pytorch/serve/issues/2747
- [ ] https://github.com/pytorch/serve/issues/2794
- [ ] https://github.com/pytorch/serve/issues/2772
- [ ] https://github.com/pytorch/serve/issues/2795

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions