This repository has been archived by the owner on Mar 5, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 238
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add prometheus metrics and replace statsd - Replace go-metrics Prometheus metrics with the Prometheus go client library - Replace go-metrics statsd with a client that does not perform in memory aggregation - Add additional logging as to the duration of requests * Add metrics docs * add documentation for grpc client/server metrics
- Loading branch information
Showing
102 changed files
with
3,977 additions
and
4,765 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# Metrics | ||
|
||
Kiam exports both Prometheus and StatsD metrics to determine the health of the | ||
system, check the timing of each RPC call, and monitor the size of the | ||
credentials cache. By default, Prometheus metrics are exported on | ||
`localhost:9620` and StatsD metrics are sent to `127.0.0.1:8125`. StatsD | ||
metrics are flushed every 100ms but are not aggregated. | ||
|
||
## Metrics configuration | ||
|
||
- The `statsd` flag controls the address to which to send StatsD metrics. This | ||
is by default `127.0.0..1:8125`. If this is blank, StatsD metrics will be | ||
silenced. | ||
- The `statsd-prefix` flag controls the initial prefix that will be appended to | ||
Kiam's StatsD metrics. This is by default `kiam`. | ||
- The `statsd-interval` flag controls how frequently the in-memory metrics | ||
buffer will be flushed to the specified StatsD endpoint. Metrics are | ||
not aggregated in this buffer and the raw counts will be flushed to the | ||
underlying StatsD sink. This is by default `100ms`. | ||
- The `prometheus-listen-addr` controls which address Kiam should create a | ||
Prometheus endpoint on. This is by default `localhost:9620`. The metrics | ||
themselves can be accessed at `<prometheus-listen-addr>/metrics`. | ||
- The `prometheus-sync-interval` flag controls how frequently Prometheus | ||
metrics should be updated. This is by default `5s`. | ||
|
||
## Emitted Metrics | ||
|
||
### Prometheus | ||
|
||
#### Metadata Subsystem | ||
- `handler_latency_milliseconds` - Bucketed histogram of handler timings. Tagged by handler | ||
- `credential_fetch_errors_total` - Number of errors fetching the credentials for a pod | ||
- `credential_encode_errors_total` - Number of errors encoding credentials for a pod | ||
- `find_role_errors_total` - Number of errors finding the role for a pod | ||
- `empty_role_total` - Number of empty roles returned | ||
- `success_total` - Number of successful responses from a handler | ||
- `responses_total` - Responses from mocked out metadata handlers | ||
|
||
#### STS Subsystem | ||
- `cache_hit_total` - Number of cache hits to the metadata cache | ||
- `cache_miss_total` - Number of cache misses to the metadata cache | ||
- `issuing_errors_total` - Number of errors issuing credentials | ||
- `assumerole_timing_milliseconds` - Bucketed histogram of assumeRole timings | ||
- `assumerole_current` - Number of assume role calls currently executing | ||
|
||
#### K8s Subsystem | ||
- `dropped_pods_total` - Number of dropped pods because of full buffer | ||
|
||
#### gRPC Server (Kiam Server) | ||
- `grpc_server_handled_total` - Total number of RPCs completed on the server, regardless of success or failure. | ||
- `grpc_server_msg_received_total` - Total number of RPC stream messages received on the server. | ||
- `grpc_server_msg_sent_total` - Total number of gRPC stream messages sent by the server. | ||
- `grpc_server_started_total` - Total number of RPCs started on the server. | ||
|
||
#### gRPC Client (Kiam Agent) | ||
- `grpc_client_handled_total` - Total number of RPCs completed by the client, regardless of success or failure. | ||
- `grpc_client_msg_received_total` - Total number of RPC stream messages received by the client. | ||
- `grpc_client_msg_sent_total` - Total number of gRPC stream messages sent by the client. | ||
- `grpc_client_started_total` - Total number of RPCs started on the client. | ||
|
||
### StatsD Timing metrics | ||
- `gateway.rpc.GetRole` - Observed client side latency of GetRole RPC | ||
- `gateway.rpc.GetCredentials` - Observed client side latency of GetCredentials RPC | ||
- `server.rpc.GetRoleCredentials` - Observed server side latency of GetRoleCredentials RPC | ||
- `server.rpc.IsAllowedAssumeRole` - Observed server side latency of IsAllowedAssumeRole RPC | ||
- `server.rpc.GetHealth` - Observed server side latency of GetHealth RPC | ||
- `server.rpc.GetPodRole` - Observed server side latency of GetPodRole RPC | ||
- `server.rpc.GetRoleCredentials` - Observed server side latency of GetRoleCredentials RPC | ||
- `handler.role_name` - Observed latency of role_name handler | ||
- `handler.health` - Observed latency of health handler | ||
- `handler.credentials` - Observed latency of credentials handler | ||
- `aws.assume_role` - Observed latency of aws assume role request |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.