Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new article on prometheus #196

Merged
merged 8 commits into from
Mar 3, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Code review
  • Loading branch information
tdegiacinto committed Feb 24, 2021
commit a1e142b065b0626278436e0e951d3d15f37f1fb5
10 changes: 5 additions & 5 deletions _posts/2019-05-17-prometheus-memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ At Coveo, we use [Prometheus 2](https://prometheus.io/) for collecting all of ou
<!-- more -->

Recently, we ran into an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. This surprised us, considering the amount of metrics we were collecting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an issue where (missing h)
Should Kubernetes be capitalized? According to their website, I think they do. Same thing with Prometheus

For comparison, benchmarks for a typical Prometheus installation usually look something like this :
For comparison, benchmarks for a typical Prometheus installation usually look something like this:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks (missing s)


* 800 microservice + k8s
* 120,000 sample/second
Expand Down Expand Up @@ -93,7 +93,7 @@ Needed_ram = number_of_serie_in_head * 8Kb (approximate size of a time series. n

### Analyze memory usage

Prometheus exposes [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.
Prometheus exposes [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let's see what we have.

```
$ go tool pprof -symbolize=remote -inuse_space https://monitoring.prod.cloud.coveo.com/debug/pprof/heap
Expand All @@ -118,9 +118,9 @@ Showing top 10 nodes out of 64
304.51MB 2.92% 84.87% 304.51MB 2.92% github.com/prometheus/tsdb.(*decbuf).uvarintStr /app/vendor/github.com/prometheus/tsdb/encoding_helpers.go
```

First, we see that the memory usage is only 10Gb, which means the remaining 30GB used is, in fact, the cached memory allocated by mmap.
First, we see that the memory usage is only 10Gb, which means the remaining 30Gb used are, in fact, the cached memory allocated by mmap.

Second, we see that we have a huge amount of memory used by labels, which likely indicates a high cardinality issue. High cardinality means a metrics is using a label which has plenty of different values.
Second, we see that we have a huge amount of memory used by labels, which likely indicates a high cardinality issue. High cardinality means a metric is using a label which has plenty of different values.

### Analyze label usage

Expand Down Expand Up @@ -252,7 +252,7 @@ Highest cardinality metric names:
120963 container_spec_memory_reservation_limit_bytes
```

We can see that the monitoring of one of the Kubernetes service(kubelet) seems to generate a lot of churn, which is normal considering that it exposes all of the container metrics and that container rotate often, and that the id label has high cardinality.
We can see that the monitoring of one of the Kubernetes service (kubelet) seems to generate a lot of churn, which is normal considering that it exposes all of the container metrics, that container rotate often, and that the id label has high cardinality.

## Actions

Expand Down