Add new article on prometheus #196

tdegiacinto · 2019-05-17T09:56:02Z

No description provided.

francoislg

👍 GG

_posts/2019-05-17-prometheus-memory.md

malaporte

Cool stuff. Couple of typos etc.

malaporte · 2019-05-21T07:03:59Z

_posts/2019-05-17-prometheus-memory.md


 author:
  name: Thomas De Giacinto
  bio: Working in the Cloud infrastructure team
  image: tdegiacinto.jpg
 ---

-<!-- more -->
+Here at coveo we are using [Prometheus 2](https://prometheus.io/) for collecting all our monitoring metrics. It's known for being able to handle millions of time series with few resources. So when our pod was hiting its 30Gi memory limit we decided to dive into it to understand how memory is allocated.


Suggested change

Here at coveo we are using [Prometheus 2](https://prometheus.io/) for collecting all our monitoring metrics. It's known for being able to handle millions of time series with few resources. So when our pod was hiting its 30Gi memory limit we decided to dive into it to understand how memory is allocated.

Here at Coveo we are using [Prometheus 2](https://prometheus.io/) for collecting all our monitoring metrics. It's known for being able to handle millions of time series with few resources. So when our pod was hiting its 30Gi memory limit we decided to dive into it to understand how memory is allocated.

malaporte · 2019-05-21T07:07:58Z

_posts/2019-05-17-prometheus-memory.md


-For example some benchmark give the following metrics :
+Recently we ran in an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. Which was surprising considering the numbers of metrics we were collecting.


Kubernetes

Suggested change

Recently we ran in an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. Which was surprising considering the numbers of metrics we were collecting.

Recently we ran in an issue were our prometheus pod was killed by Kubernetes because it was reaching its 30Gi memory limit. Which was surprising considering the numbers of metrics we were collecting.

malaporte · 2019-05-21T07:08:29Z

_posts/2019-05-17-prometheus-memory.md


-For example some benchmark give the following metrics :
+Recently we ran in an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. Which was surprising considering the numbers of metrics we were collecting.
+For comparaison, some benchmark available on internet give the following statistics :


Suggested change

For comparaison, some benchmark available on internet give the following statistics :

For comparison, some benchmark available on the Internet give the following statistics :

malaporte · 2019-05-21T07:09:55Z

_posts/2019-05-17-prometheus-memory.md

+
+### Storage
+
+When prometheus scrape a target it retrieve thousands of metrics, which will be compacted into chunk and stored in block before being written on disk. Only the head block is writable, all other blocks are immutable. By default, a block contain 2h of data.


Suggested change

When prometheus scrape a target it retrieve thousands of metrics, which will be compacted into chunk and stored in block before being written on disk. Only the head block is writable, all other blocks are immutable. By default, a block contain 2h of data.

When Prometheus scrapes a target it retrieve thousands of metrics, which will be compacted into chunk and stored in block before being written on disk. Only the head block is writable, all other blocks are immutable. By default, a block contain 2h of data.

malaporte · 2019-05-21T07:10:48Z

_posts/2019-05-17-prometheus-memory.md

+
+### Analyze memory usage
+
+Prometheus expose [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.


Suggested change

Prometheus expose [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.

Prometheus exposes [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.

malaporte · 2019-05-21T07:11:26Z

_posts/2019-05-17-prometheus-memory.md

+
+First thing we see that the memory usage is only 10 Go which means all 30Go remaining memory used, see by kubernetes, is in fact the cached memory used by mmap.
+
+Secondly we see that we have a huge amount of memory used by labels which indicate a probably high cardinality issue. High cardinality mean a metrics using a label which has plenty of different value


Suggested change

Secondly we see that we have a huge amount of memory used by labels which indicate a probably high cardinality issue. High cardinality mean a metrics using a label which has plenty of different value

Secondly we see that we have a huge amount of memory used by labels which indicate a probably high cardinality issue. High cardinality mean a metrics using a label which has plenty of different values.

malaporte · 2019-05-21T07:11:45Z

_posts/2019-05-17-prometheus-memory.md

+
+The tsdb binary has an `analyze` option which can retrieve many useful statistics on the tsdb database.
+
+So we decided to copy the disk storing our data from prometheus and mount it on a dedicated instance to run the analyze.


Suggested change

So we decided to copy the disk storing our data from prometheus and mount it on a dedicated instance to run the analyze.

So we decided to copy the disk storing our data from prometheus and mount it on a dedicated instance to run the analysis.

malaporte · 2019-05-21T07:12:10Z

_posts/2019-05-17-prometheus-memory.md

+120963 container_spec_memory_reservation_limit_bytes
+```
+
+We can see that the monitoring of one of the Kubernetes service(kubelet) seems to generate a lot of churn (which is normal considering that it expose all of the container metrics and that container rotate often ) and that the id label has an high cardinality.


Suggested change

We can see that the monitoring of one of the Kubernetes service(kubelet) seems to generate a lot of churn (which is normal considering that it expose all of the container metrics and that container rotate often ) and that the id label has an high cardinality.

We can see that the monitoring of one of the Kubernetes service (kubelet) seems to generate a lot of churn (which is normal considering that it expose all of the container metrics and that container rotate often) and that the id label has an high cardinality.

malaporte · 2019-05-21T07:12:42Z

_posts/2019-05-17-prometheus-memory.md

+## What we learned
+
+* Labels in metrics have more impact on the memory usage than the metrics itself.
+* Memory seen by docker is not the memory really used by prometheus.


Suggested change

* Memory seen by docker is not the memory really used by prometheus.

* Memory seen by Docker is not the memory really used by Prometheus.

malaporte · 2019-05-21T07:13:00Z

_posts/2019-05-17-prometheus-memory.md

+
+* Labels in metrics have more impact on the memory usage than the metrics itself.
+* Memory seen by docker is not the memory really used by prometheus.
+* Go profiling is a nice debugging tool.


Suggested change

* Go profiling is a nice debugging tool.

* The Go profiler is a nice debugging tool.

Taken previous comments into account Still a few questions

amoreauCoveo

still a few changes but nothing major :gg:

_posts/2019-05-17-prometheus-memory.md

amoreauCoveo

Nice post! :) Only minor language fixes left, and then ready to merge!

amoreauCoveo · 2021-02-25T18:08:34Z

_posts/2019-05-17-prometheus-memory.md

+  image: tdegiacinto.jpg
+---
+
+At Coveo, we use [Prometheus 2](https://prometheus.io/) for collecting all of our monitoring metrics. Prometheus is known for being able to handle millions of time series with only a few resources. So when our pod was hiting its 30Gi memory limit, we decided to dive into it to understand how memory is allocated, and get to the root of the issue.


hitting (two t's)

amoreauCoveo · 2021-02-25T18:09:53Z

_posts/2019-05-17-prometheus-memory.md

+
+<!-- more -->
+
+Recently, we ran into an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. This surprised us, considering the amount of metrics we were collecting.


an issue where (missing h)
Should Kubernetes be capitalized? According to their website, I think they do. Same thing with Prometheus

amoreauCoveo · 2021-02-25T18:10:07Z

_posts/2019-05-17-prometheus-memory.md

+<!-- more -->
+
+Recently, we ran into an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. This surprised us, considering the amount of metrics we were collecting.
+For comparison, benchmarks for a typical Prometheus installation usually look something like this:


looks (missing s)

tdegiacinto added 2 commits May 17, 2019 11:36

add article on prometheus

c602035

Some adjustements

dc658ab

francoislg reviewed May 17, 2019

View reviewed changes

apply review

4205e5c

malaporte approved these changes May 21, 2019

View reviewed changes

tdegiacinto and others added 2 commits October 28, 2019 12:16

Apply review

3dff203

Correct blogpost (#197)

1e47fa9

Taken previous comments into account Still a few questions

amoreauCoveo reviewed Oct 29, 2019

View reviewed changes

Code review

a1e142b

amoreauCoveo approved these changes Feb 25, 2021

View reviewed changes

tdegiacinto added 2 commits March 1, 2021 09:08

apply review

1bb9c1e

update document date

5046fc0

amoreauCoveo merged commit 5fcd0ab into gh-pages Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new article on prometheus #196

Add new article on prometheus #196

tdegiacinto commented May 17, 2019

francoislg left a comment

malaporte left a comment

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

malaporte May 21, 2019

amoreauCoveo left a comment

amoreauCoveo left a comment

amoreauCoveo Feb 25, 2021

amoreauCoveo Feb 25, 2021

amoreauCoveo Feb 25, 2021

	Here at coveo we are using [Prometheus 2](https://prometheus.io/) for collecting all our monitoring metrics. It's known for being able to handle millions of time series with few resources. So when our pod was hiting its 30Gi memory limit we decided to dive into it to understand how memory is allocated.
	Here at Coveo we are using [Prometheus 2](https://prometheus.io/) for collecting all our monitoring metrics. It's known for being able to handle millions of time series with few resources. So when our pod was hiting its 30Gi memory limit we decided to dive into it to understand how memory is allocated.


		For example some benchmark give the following metrics :
		Recently we ran in an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. Which was surprising considering the numbers of metrics we were collecting.

	For comparaison, some benchmark available on internet give the following statistics :
	For comparison, some benchmark available on the Internet give the following statistics :


		### Storage

		When prometheus scrape a target it retrieve thousands of metrics, which will be compacted into chunk and stored in block before being written on disk. Only the head block is writable, all other blocks are immutable. By default, a block contain 2h of data.

	When prometheus scrape a target it retrieve thousands of metrics, which will be compacted into chunk and stored in block before being written on disk. Only the head block is writable, all other blocks are immutable. By default, a block contain 2h of data.
	When Prometheus scrapes a target it retrieve thousands of metrics, which will be compacted into chunk and stored in block before being written on disk. Only the head block is writable, all other blocks are immutable. By default, a block contain 2h of data.


		### Analyze memory usage

		Prometheus expose [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.

	Prometheus expose [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.
	Prometheus exposes [Go](https://golang.org/) [profiling tools](https://golang.org/pkg/runtime/pprof/), so let see what we have.


		First thing we see that the memory usage is only 10 Go which means all 30Go remaining memory used, see by kubernetes, is in fact the cached memory used by mmap.

		Secondly we see that we have a huge amount of memory used by labels which indicate a probably high cardinality issue. High cardinality mean a metrics using a label which has plenty of different value


		The tsdb binary has an `analyze` option which can retrieve many useful statistics on the tsdb database.

		So we decided to copy the disk storing our data from prometheus and mount it on a dedicated instance to run the analyze.

	We can see that the monitoring of one of the Kubernetes service(kubelet) seems to generate a lot of churn (which is normal considering that it expose all of the container metrics and that container rotate often ) and that the id label has an high cardinality.
	We can see that the monitoring of one of the Kubernetes service (kubelet) seems to generate a lot of churn (which is normal considering that it expose all of the container metrics and that container rotate often) and that the id label has an high cardinality.

	* Memory seen by docker is not the memory really used by prometheus.
	* Memory seen by Docker is not the memory really used by Prometheus.

	* Go profiling is a nice debugging tool.
	* The Go profiler is a nice debugging tool.


		<!-- more -->

		Recently, we ran into an issue were our prometheus pod was killed by kubenertes because it was reaching its 30Gi memory limit. This surprised us, considering the amount of metrics we were collecting.

Add new article on prometheus #196

Add new article on prometheus #196

Conversation

tdegiacinto commented May 17, 2019

francoislg left a comment

Choose a reason for hiding this comment

malaporte left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amoreauCoveo left a comment

Choose a reason for hiding this comment

amoreauCoveo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment