Skip to content

Commit b40766e

Browse files
committed
wip
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
1 parent cefad78 commit b40766e

File tree

5 files changed

+70
-57
lines changed

5 files changed

+70
-57
lines changed

observability/README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ Monitors Java Virtual Machine health and performance:
6161
- `system.cpu.usage`, `system.cpu.count`
6262
- `process.uptime`
6363

64+
**Filtering:**
65+
All panels filter by `service_name="josdk"` to show metrics only from your operator.
66+
6467
### 2. Java Operator SDK Metrics Dashboard (`josdk-operator-metrics-dashboard.json`)
6568

6669
Monitors Kubernetes operator performance and health:
@@ -87,6 +90,9 @@ Monitors Kubernetes operator performance and health:
8790
- `operator.sdk.events.received`, `.delete` - Event reception
8891
- Retry metrics and failure breakdowns
8992

93+
**Filtering:**
94+
All panels filter by `service_name="josdk"` to show metrics only from your operator.
95+
9096
## Importing Dashboards into Grafana
9197

9298
### Automatic Import (Default)
@@ -180,8 +186,8 @@ Open http://localhost:9090/targets and verify the OTLP collector target is UP.
180186

181187
### Verify Metrics in Prometheus
182188
Open Prometheus UI and search for metrics:
183-
- JVM metrics: `otel_jvm_*`
184-
- Operator metrics: `otel_operator_sdk_*`
189+
- JVM metrics: `jvm_*`
190+
- Operator metrics: `operator_sdk_*`
185191

186192
### Check Grafana Data Source
187193
1. Navigate to **Configuration** → **Data Sources**
@@ -217,25 +223,25 @@ After making changes, re-import the dashboard using one of the methods above.
217223
### JVM Metrics
218224
```promql
219225
# Heap memory usage percentage
220-
(otel_jvm_memory_used_bytes{area="heap"} / otel_jvm_memory_max_bytes{area="heap"}) * 100
226+
(jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}) * 100
221227
222228
# GC throughput (percentage of time NOT in GC)
223-
100 - (rate(otel_jvm_gc_pause_seconds_sum[5m]) * 100)
229+
100 - (rate(jvm_gc_pause_seconds_sum[5m]) * 100)
224230
225231
# Thread count trend
226-
otel_jvm_threads_live_threads
232+
jvm_threads_live_threads
227233
```
228234

229235
### Operator Metrics
230236
```promql
231237
# Reconciliation success rate
232-
rate(otel_operator_sdk_reconciliations_success_total[5m]) / rate(otel_operator_sdk_reconciliations_started_total[5m])
238+
rate(operator_sdk_reconciliations_success_total[5m]) / rate(operator_sdk_reconciliations_started_total[5m])
233239
234240
# Average reconciliation time
235-
rate(otel_operator_sdk_controllers_execution_reconcile_seconds_sum[5m]) / rate(otel_operator_sdk_controllers_execution_reconcile_seconds_count[5m])
241+
rate(operator_sdk_controllers_execution_reconcile_seconds_sum[5m]) / rate(operator_sdk_controllers_execution_reconcile_seconds_count[5m])
236242
237243
# Queue saturation
238-
otel_operator_sdk_reconciliations_queue_size / on() group_left() max(otel_operator_sdk_reconciliations_queue_size)
244+
operator_sdk_reconciliations_queue_size / on() group_left() max(operator_sdk_reconciliations_queue_size)
239245
```
240246

241247
## References

observability/install-observability.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,8 @@ spec:
146146
namespace: ""
147147
send_timestamps: true
148148
metric_expiration: 5m
149+
resource_to_telemetry_conversion:
150+
enabled: true
149151
debug:
150152
verbosity: detailed
151153
sampling_initial: 5

observability/josdk-operator-metrics-dashboard.json

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@
103103
"uid": "prometheus"
104104
},
105105
"editorMode": "code",
106-
"expr": "sum(rate(otel_operator_sdk_reconciliations_started_total{job=\"webpage-operator\"}[5m])) by (kind, version)",
106+
"expr": "sum(rate(operator_sdk_reconciliations_started_total{service_name=\"josdk\"}[5m])) by (kind, version)",
107107
"legendFormat": "{{kind}} ({{version}})",
108108
"range": true,
109109
"refId": "A"
@@ -224,7 +224,7 @@
224224
"uid": "prometheus"
225225
},
226226
"editorMode": "code",
227-
"expr": "sum(rate(otel_operator_sdk_reconciliations_success_total{job=\"webpage-operator\"}[5m]))",
227+
"expr": "sum(rate(operator_sdk_reconciliations_success_total{service_name=\"josdk\"}[5m]))",
228228
"legendFormat": "Success",
229229
"range": true,
230230
"refId": "A"
@@ -235,7 +235,7 @@
235235
"uid": "prometheus"
236236
},
237237
"editorMode": "code",
238-
"expr": "sum(rate(otel_operator_sdk_reconciliations_failed_total{job=\"webpage-operator\"}[5m]))",
238+
"expr": "sum(rate(operator_sdk_reconciliations_failed_total{service_name=\"josdk\"}[5m]))",
239239
"legendFormat": "Failure",
240240
"range": true,
241241
"refId": "B"
@@ -302,7 +302,7 @@
302302
"uid": "prometheus"
303303
},
304304
"editorMode": "code",
305-
"expr": "sum(otel_operator_sdk_reconciliations_executions{job=\"webpage-operator\"})",
305+
"expr": "sum(operator_sdk_reconciliations_executions{service_name=\"josdk\"})",
306306
"legendFormat": "Executing",
307307
"range": true,
308308
"refId": "A"
@@ -369,7 +369,7 @@
369369
"uid": "prometheus"
370370
},
371371
"editorMode": "code",
372-
"expr": "sum(otel_operator_sdk_reconciliations_queue_size{job=\"webpage-operator\"})",
372+
"expr": "sum(operator_sdk_reconciliations_queue_size{service_name=\"josdk\"})",
373373
"legendFormat": "Queue Size",
374374
"range": true,
375375
"refId": "A"
@@ -430,7 +430,7 @@
430430
"uid": "prometheus"
431431
},
432432
"editorMode": "code",
433-
"expr": "sum(otel_operator_sdk_reconciliations_started_total{job=\"webpage-operator\"})",
433+
"expr": "sum(operator_sdk_reconciliations_started_total{service_name=\"josdk\"})",
434434
"legendFormat": "Total",
435435
"range": true,
436436
"refId": "A"
@@ -495,7 +495,7 @@
495495
"uid": "prometheus"
496496
},
497497
"editorMode": "code",
498-
"expr": "sum(rate(otel_operator_sdk_reconciliations_failed_total{job=\"webpage-operator\"}[5m]))",
498+
"expr": "sum(rate(operator_sdk_reconciliations_failed_total{service_name=\"josdk\"}[5m]))",
499499
"legendFormat": "Error Rate",
500500
"range": true,
501501
"refId": "A"
@@ -585,7 +585,7 @@
585585
"uid": "prometheus"
586586
},
587587
"editorMode": "code",
588-
"expr": "histogram_quantile(0.50, sum(rate(otel_operator_sdk_controllers_execution_reconcile_seconds_bucket{job=\"webpage-operator\"}[5m])) by (le, controller))",
588+
"expr": "histogram_quantile(0.50, sum(rate(operator_sdk_controllers_execution_reconcile_seconds_bucket{service_name=\"josdk\"}[5m])) by (le, controller))",
589589
"legendFormat": "p50 - {{controller}}",
590590
"range": true,
591591
"refId": "A"
@@ -596,7 +596,7 @@
596596
"uid": "prometheus"
597597
},
598598
"editorMode": "code",
599-
"expr": "histogram_quantile(0.95, sum(rate(otel_operator_sdk_controllers_execution_reconcile_seconds_bucket{job=\"webpage-operator\"}[5m])) by (le, controller))",
599+
"expr": "histogram_quantile(0.95, sum(rate(operator_sdk_controllers_execution_reconcile_seconds_bucket{service_name=\"josdk\"}[5m])) by (le, controller))",
600600
"legendFormat": "p95 - {{controller}}",
601601
"range": true,
602602
"refId": "B"
@@ -607,7 +607,7 @@
607607
"uid": "prometheus"
608608
},
609609
"editorMode": "code",
610-
"expr": "histogram_quantile(0.99, sum(rate(otel_operator_sdk_controllers_execution_reconcile_seconds_bucket{job=\"webpage-operator\"}[5m])) by (le, controller))",
610+
"expr": "histogram_quantile(0.99, sum(rate(operator_sdk_controllers_execution_reconcile_seconds_bucket{service_name=\"josdk\"}[5m])) by (le, controller))",
611611
"legendFormat": "p99 - {{controller}}",
612612
"range": true,
613613
"refId": "C"
@@ -697,7 +697,7 @@
697697
"uid": "prometheus"
698698
},
699699
"editorMode": "code",
700-
"expr": "sum(rate(otel_operator_sdk_events_received_total{job=\"webpage-operator\"}[5m])) by (event, action)",
700+
"expr": "sum(rate(operator_sdk_events_received_total{service_name=\"josdk\"}[5m])) by (event, action)",
701701
"legendFormat": "{{event}} - {{action}}",
702702
"range": true,
703703
"refId": "A"
@@ -787,7 +787,7 @@
787787
"uid": "prometheus"
788788
},
789789
"editorMode": "code",
790-
"expr": "sum(rate(otel_operator_sdk_reconciliations_failed_total{job=\"webpage-operator\"}[5m])) by (exception)",
790+
"expr": "sum(rate(operator_sdk_reconciliations_failed_total{service_name=\"josdk\"}[5m])) by (exception)",
791791
"legendFormat": "{{exception}}",
792792
"range": true,
793793
"refId": "A"
@@ -877,7 +877,7 @@
877877
"uid": "prometheus"
878878
},
879879
"editorMode": "code",
880-
"expr": "sum(rate(otel_operator_sdk_controllers_execution_reconcile_success_total{job=\"webpage-operator\"}[5m])) by (type)",
880+
"expr": "sum(rate(operator_sdk_controllers_execution_reconcile_success_total{service_name=\"josdk\"}[5m])) by (type)",
881881
"legendFormat": "Success - {{type}}",
882882
"range": true,
883883
"refId": "A"
@@ -888,7 +888,7 @@
888888
"uid": "prometheus"
889889
},
890890
"editorMode": "code",
891-
"expr": "sum(rate(otel_operator_sdk_controllers_execution_reconcile_failure_total{job=\"webpage-operator\"}[5m])) by (exception)",
891+
"expr": "sum(rate(operator_sdk_controllers_execution_reconcile_failure_total{service_name=\"josdk\"}[5m])) by (exception)",
892892
"legendFormat": "Failure - {{exception}}",
893893
"range": true,
894894
"refId": "B"
@@ -978,7 +978,7 @@
978978
"uid": "prometheus"
979979
},
980980
"editorMode": "code",
981-
"expr": "sum(rate(otel_operator_sdk_events_delete_total{job=\"webpage-operator\"}[5m])) by (kind, version)",
981+
"expr": "sum(rate(operator_sdk_events_delete_total{service_name=\"josdk\"}[5m])) by (kind, version)",
982982
"legendFormat": "{{kind}} ({{version}})",
983983
"range": true,
984984
"refId": "A"
@@ -1068,7 +1068,7 @@
10681068
"uid": "prometheus"
10691069
},
10701070
"editorMode": "code",
1071-
"expr": "sum(rate(otel_operator_sdk_reconciliations_started_total{job=\"webpage-operator\", operator_sdk_reconciliations_retries_last=\"true\"}[5m]))",
1071+
"expr": "sum(rate(operator_sdk_reconciliations_started_total{service_name=\"josdk\", operator_sdk_reconciliations_retries_last=\"true\"}[5m]))",
10721072
"legendFormat": "Last Retry Attempts",
10731073
"range": true,
10741074
"refId": "A"
@@ -1079,7 +1079,7 @@
10791079
"uid": "prometheus"
10801080
},
10811081
"editorMode": "code",
1082-
"expr": "sum(rate(otel_operator_sdk_reconciliations_started_total{job=\"webpage-operator\", operator_sdk_reconciliations_retries_last=\"false\"}[5m]))",
1082+
"expr": "sum(rate(operator_sdk_reconciliations_started_total{service_name=\"josdk\", operator_sdk_reconciliations_retries_last=\"false\"}[5m]))",
10831083
"legendFormat": "Retries (Not Last)",
10841084
"range": true,
10851085
"refId": "B"

observability/jvm-metrics-dashboard.json

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@
106106
"uid": "prometheus"
107107
},
108108
"editorMode": "code",
109-
"expr": "otel_jvm_memory_used_bytes{job=\"webpage-operator\"}",
109+
"expr": "jvm_memory_used_bytes{service_name=\"josdk\"}",
110110
"legendFormat": "{{area}} - {{id}}",
111111
"range": true,
112112
"refId": "A"
@@ -195,7 +195,7 @@
195195
"uid": "prometheus"
196196
},
197197
"editorMode": "code",
198-
"expr": "otel_jvm_threads_live_threads{job=\"webpage-operator\"}",
198+
"expr": "jvm_threads_live{service_name=\"josdk\"}",
199199
"legendFormat": "Live Threads",
200200
"range": true,
201201
"refId": "A"
@@ -206,7 +206,7 @@
206206
"uid": "prometheus"
207207
},
208208
"editorMode": "code",
209-
"expr": "otel_jvm_threads_daemon_threads{job=\"webpage-operator\"}",
209+
"expr": "jvm_threads_daemon_threads{service_name=\"josdk\"}",
210210
"legendFormat": "Daemon Threads",
211211
"range": true,
212212
"refId": "B"
@@ -217,7 +217,7 @@
217217
"uid": "prometheus"
218218
},
219219
"editorMode": "code",
220-
"expr": "otel_jvm_threads_peak_threads{job=\"webpage-operator\"}",
220+
"expr": "jvm_threads_peak_threads{service_name=\"josdk\"}",
221221
"legendFormat": "Peak Threads",
222222
"range": true,
223223
"refId": "C"
@@ -306,7 +306,7 @@
306306
"uid": "prometheus"
307307
},
308308
"editorMode": "code",
309-
"expr": "rate(otel_jvm_gc_pause_seconds_sum{job=\"webpage-operator\"}[5m])",
309+
"expr": "rate(jvm_gc_pause_milliseconds_sum{service_name=\"josdk\"}[5m])",
310310
"legendFormat": "{{action}} - {{cause}}",
311311
"range": true,
312312
"refId": "A"
@@ -395,7 +395,7 @@
395395
"uid": "prometheus"
396396
},
397397
"editorMode": "code",
398-
"expr": "rate(otel_jvm_gc_pause_seconds_count{job=\"webpage-operator\"}[5m])",
398+
"expr": "rate(jvm_gc_pause_milliseconds_count{service_name=\"josdk\"}[5m])",
399399
"legendFormat": "{{action}} - {{cause}}",
400400
"range": true,
401401
"refId": "A"
@@ -453,7 +453,7 @@
453453
"uid": "prometheus"
454454
},
455455
"editorMode": "code",
456-
"expr": "otel_system_cpu_usage{job=\"webpage-operator\"}",
456+
"expr": "system_cpu_usage{service_name=\"josdk\"}",
457457
"legendFormat": "CPU Usage",
458458
"range": true,
459459
"refId": "A"
@@ -511,7 +511,7 @@
511511
"uid": "prometheus"
512512
},
513513
"editorMode": "code",
514-
"expr": "otel_jvm_classes_loaded_classes{job=\"webpage-operator\"}",
514+
"expr": "jvm_classes_loaded{service_name=\"josdk\"}",
515515
"legendFormat": "Classes Loaded",
516516
"range": true,
517517
"refId": "A"
@@ -540,7 +540,7 @@
540540
}
541541
]
542542
},
543-
"unit": "s"
543+
"unit": "ms"
544544
},
545545
"overrides": []
546546
},
@@ -569,7 +569,7 @@
569569
"uid": "prometheus"
570570
},
571571
"editorMode": "code",
572-
"expr": "otel_process_uptime_seconds{job=\"webpage-operator\"}",
572+
"expr": "process_uptime_milliseconds{service_name=\"josdk\"}",
573573
"legendFormat": "Uptime",
574574
"range": true,
575575
"refId": "A"
@@ -627,7 +627,7 @@
627627
"uid": "prometheus"
628628
},
629629
"editorMode": "code",
630-
"expr": "otel_system_cpu_count{job=\"webpage-operator\"}",
630+
"expr": "system_cpu_count{service_name=\"josdk\"}",
631631
"legendFormat": "CPU Count",
632632
"range": true,
633633
"refId": "A"
@@ -716,7 +716,7 @@
716716
"uid": "prometheus"
717717
},
718718
"editorMode": "code",
719-
"expr": "rate(otel_jvm_gc_memory_allocated_bytes_total{job=\"webpage-operator\"}[5m])",
719+
"expr": "rate(jvm_gc_memory_allocated_bytes_total{service_name=\"josdk\"}[5m])",
720720
"legendFormat": "Allocated",
721721
"range": true,
722722
"refId": "A"
@@ -727,7 +727,7 @@
727727
"uid": "prometheus"
728728
},
729729
"editorMode": "code",
730-
"expr": "rate(otel_jvm_gc_memory_promoted_bytes_total{job=\"webpage-operator\"}[5m])",
730+
"expr": "rate(jvm_gc_memory_promoted_bytes_total{service_name=\"josdk\"}[5m])",
731731
"legendFormat": "Promoted",
732732
"range": true,
733733
"refId": "B"
@@ -816,7 +816,7 @@
816816
"uid": "prometheus"
817817
},
818818
"editorMode": "code",
819-
"expr": "otel_jvm_memory_max_bytes{job=\"webpage-operator\", area=\"heap\"}",
819+
"expr": "jvm_memory_max_bytes{service_name=\"josdk\", area=\"heap\"}",
820820
"legendFormat": "Max Heap",
821821
"range": true,
822822
"refId": "A"
@@ -827,7 +827,7 @@
827827
"uid": "prometheus"
828828
},
829829
"editorMode": "code",
830-
"expr": "otel_jvm_memory_committed_bytes{job=\"webpage-operator\", area=\"heap\"}",
830+
"expr": "jvm_memory_committed_bytes{service_name=\"josdk\", area=\"heap\"}",
831831
"legendFormat": "Committed Heap",
832832
"range": true,
833833
"refId": "B"

0 commit comments

Comments
 (0)