Skip to content

Commit 5700523

Browse files
paulohtb6travisdownsFeediver1
authored
docs: add clarification about memory usage (#1237)
Co-authored-by: Travis Downs <travis.downs@gmail.com> Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
1 parent 9332f3d commit 5700523

File tree

2 files changed

+26
-6
lines changed

2 files changed

+26
-6
lines changed

modules/manage/partials/monitor-health.adoc

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ rate(redpanda_uptime_seconds_total[5m])
3737

3838
For the total CPU busy (non-idle) time, monitor xref:reference:public-metrics-reference.adoc#redpanda_cpu_busy_seconds_total[`redpanda_cpu_busy_seconds_total`].
3939

40-
To detect unexpected idling, you can query the rate of change as a percentage of the shard that is in use at a given point in time.
40+
To detect unexpected idling, you can query the rate of change as a fraction of the shard that is in use at a given point in time.
4141

4242
[,promql]
4343
----
@@ -53,18 +53,34 @@ This high host-level CPU utilization happens because Redpanda uses Seastar, whic
5353
Use xref:reference:public-metrics-reference.adoc#redpanda_cpu_busy_seconds_total[`redpanda_cpu_busy_seconds_total`] to monitor the actual Redpanda CPU utilization. When it indicates close to 100% utilization over a given period of time, make sure to also monitor produce and consume <<latency,latency>> as they may then start to increase as a result of resources becoming overburdened.
5454
====
5555

56-
==== Memory allocated
56+
==== Memory availability and pressure
5757

58-
To monitor the percentage of memory allocated, use a formula with xref:reference:public-metrics-reference.adoc#redpanda_memory_allocated_memory[`redpanda_memory_allocated_memory`] and xref:reference:public-metrics-reference.adoc#redpanda_memory_free_memory[`redpanda_memory_free_memory`]:
58+
To monitor memory, use xref:reference:public-metrics-reference.adoc#redpanda_memory_available_memory[`redpanda_memory_available_memory`], which includes both free memory and reclaimable memory from the batch cache. This provides a more accurate picture than using allocated memory alone, since allocated does not include reclaimable cache memory.
59+
60+
To monitor the fraction of memory available:
61+
62+
[,promql]
63+
----
64+
min(redpanda_memory_available_memory / (redpanda_memory_free_memory + redpanda_memory_allocated_memory))
65+
----
66+
67+
To monitor memory pressure (fraction of memory being used), which may be more intuitive for alerting:
68+
69+
[,promql]
70+
----
71+
min(redpanda_memory_available_memory / redpanda_memory_allocated_memory)
72+
----
73+
74+
You can also monitor the lowest available memory available since the process started to understand historical memory pressure:
5975

6076
[,promql]
6177
----
62-
sum(redpanda_memory_allocated_memory) / (sum(redpanda_memory_free_memory) + sum(redpanda_memory_allocated_memory))
78+
min(redpanda_memory_available_memory_low_water_mark / (redpanda_memory_free_memory + redpanda_memory_allocated_memory))
6379
----
6480

6581
==== Disk used
6682

67-
To monitor the percentage of disk consumed, use a formula with xref:reference:public-metrics-reference.adoc#redpanda_storage_disk_free_bytes[`redpanda_storage_disk_free_bytes`] and xref:reference:public-metrics-reference.adoc#redpanda_storage_disk_total_bytes[`redpanda_storage_disk_total_bytes`]:
83+
To monitor the fraction of disk consumed, use a formula with xref:reference:public-metrics-reference.adoc#redpanda_storage_disk_free_bytes[`redpanda_storage_disk_free_bytes`] and xref:reference:public-metrics-reference.adoc#redpanda_storage_disk_total_bytes[`redpanda_storage_disk_total_bytes`]:
6884

6985
[,promql]
7086
----

modules/reference/pages/public-metrics-reference.adoc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -685,6 +685,8 @@ Total memory allocated (in bytes) per CPU shard.
685685

686686
* `shard`
687687

688+
*Usage*: This metric includes reclaimable memory from the batch cache. For monitoring memory pressure, consider using `redpanda_memory_available_memory` instead, which provides a more accurate picture of memory that can be immediately reallocated.
689+
688690
---
689691

690692
=== redpanda_memory_available_memory
@@ -697,7 +699,7 @@ Total memory (in bytes) available to a CPU shard—including both free and recla
697699

698700
* `shard`
699701

700-
*Usage*: Indicates memory pressure on each shard.
702+
*Usage*: This metric is more useful than `redpanda_memory_allocated_memory` for monitoring memory pressure, as it accounts for reclaimable memory in the batch cache. A low value indicates the system is approaching memory exhaustion.
701703

702704
---
703705

@@ -711,6 +713,8 @@ The lowest recorded available memory (in bytes) per CPU shard since the process
711713

712714
* `shard`
713715

716+
*Usage*: This metric helps identify the closest the system has come to memory exhaustion. Useful for capacity planning and understanding historical memory pressure patterns.
717+
714718
---
715719

716720
=== redpanda_memory_free_memory

0 commit comments

Comments
 (0)