Skip to content

Conversation

@codingl2k1
Copy link
Contributor

@codingl2k1 codingl2k1 commented Jan 17, 2024

  • Add record_metrics method to SupervisorActor, WorkerActor and ModelActor.
  • Start metrics exporter server at worker.
  • Expose --metrics-exporter-host and --metrics-exporter-port to cmdline.

For prometheus scraping:

  • Each worker launches a metrics export server, the host and port can be specified by --metrics-exporter-host and --metrics-exporter-port.
  • Supervisor {endpoint}/metrics is also a metrics export server, it collects the metrics of RESTful API.

Known issue:

  • Some backends do not have tokens information.

Metrics exporter server example:

# HELP xinference:exceptions_total_counter Total number of requested which generated an exception.
# TYPE xinference:exceptions_total_counter counter
# HELP xinference:generate_tokens_per_s Generate throughput in tokens/s.
# TYPE xinference:generate_tokens_per_s gauge
xinference:generate_tokens_per_s{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 0.2784720574189427
# HELP xinference:input_tokens_total_counter Total number of input tokens.
# TYPE xinference:input_tokens_total_counter counter
xinference:input_tokens_total_counter{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 20
# HELP xinference:output_tokens_total_counter Total number of output tokens.
# TYPE xinference:output_tokens_total_counter counter
xinference:output_tokens_total_counter{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 7
# HELP xinference:requests_total_counter Total number of requests received.
# TYPE xinference:requests_total_counter counter
xinference:requests_total_counter{method="GET",path="/ui"} 1
xinference:requests_total_counter{method="POST",path="/v1/models"} 1
xinference:requests_total_counter{method="GET",path="/v1/models/"} 2
xinference:requests_total_counter{method="GET",path="/v1/models"} 2
xinference:requests_total_counter{method="HEAD",path="/qwen-chat"} 1
xinference:requests_total_counter{method="POST",path="/v1/ui/{model_uid}"} 1
xinference:requests_total_counter{method="GET",path="/qwen-chat"} 4
xinference:requests_total_counter{method="POST",path="/qwen-chat"} 4
xinference:requests_total_counter{method="None",path="/qwen-chat"} 1
xinference:requests_total_counter{method="GET",path="/v1/cluster/auth"} 1
xinference:requests_total_counter{method="GET",path="/v1/models/{model_uid}"} 1
xinference:requests_total_counter{method="POST",path="/v1/chat/completions"} 1
# HELP xinference:responses_total_counter Total number of responses sent.
# TYPE xinference:responses_total_counter counter
xinference:responses_total_counter{method="GET",path="/v1/model_registrations/{model_type}"} 1
xinference:responses_total_counter{method="GET",path="/v1/cluster/devices"} 1
xinference:responses_total_counter{method="GET",path="/ui"} 1
xinference:responses_total_counter{method="POST",path="/v1/models"} 1
xinference:responses_total_counter{method="GET",path="/v1/models/"} 2
xinference:responses_total_counter{method="GET",path="/v1/models"} 2
xinference:responses_total_counter{method="HEAD",path="/qwen-chat"} 1
xinference:responses_total_counter{method="POST",path="/v1/ui/{model_uid}"} 1
xinference:responses_total_counter{method="GET",path="/qwen-chat"} 4
xinference:responses_total_counter{method="POST",path="/qwen-chat"} 4
xinference:responses_total_counter{method="GET",path="/v1/cluster/auth"} 1
xinference:responses_total_counter{method="GET",path="/v1/models/{model_uid}"} 1
xinference:responses_total_counter{method="POST",path="/v1/chat/completions"} 1
# HELP xinference:status_codes_counter Total number of response status codes.
# TYPE xinference:status_codes_counter counter
xinference:status_codes_counter{method="GET",path="/v1/model_registrations/{model_type}",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/cluster/devices",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/ui",status_code="404"} 1
xinference:status_codes_counter{method="POST",path="/v1/models",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/models/",status_code="307"} 2
xinference:status_codes_counter{method="GET",path="/v1/models",status_code="200"} 2
xinference:status_codes_counter{method="HEAD",path="/qwen-chat",status_code="404"} 1
xinference:status_codes_counter{method="POST",path="/v1/ui/{model_uid}",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/qwen-chat",status_code="307"} 1
xinference:status_codes_counter{method="GET",path="/qwen-chat",status_code="200"} 3
xinference:status_codes_counter{method="POST",path="/qwen-chat",status_code="200"} 4
xinference:status_codes_counter{method="GET",path="/v1/cluster/auth",status_code="200"} 1
xinference:status_codes_counter{method="GET",path="/v1/models/{model_uid}",status_code="200"} 1
xinference:status_codes_counter{method="POST",path="/v1/chat/completions",status_code="200"} 1
# HELP xinference:time_to_first_token_ms First token latency in ms.
# TYPE xinference:time_to_first_token_ms gauge
xinference:time_to_first_token_ms{format="pytorch",model="qwen-chat",node="127.0.0.1:47981",quantization="none",type="LLM"} 20076.820135116577

@XprobeBot XprobeBot added this to the v0.8.1 milestone Jan 17, 2024
@codingl2k1 codingl2k1 marked this pull request as ready for review January 18, 2024 07:26
@aresnow1 aresnow1 merged commit c1e1c5a into xorbitsai:main Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants