-
Notifications
You must be signed in to change notification settings - Fork 259
Integrate GCP and Anthos GCP prometheus query #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
WalkthroughAdds GCP Managed Prometheus and Anthos support: new GCP/Anthos CPU and memory metric loaders and metrics services, loader-selection logic based on URL and a new CLI/config flag, adjusted pod-discovery logging, tests, docs, .gitignore additions, and a GCP quick-test script. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI / Config
participant Loader as PrometheusLoader
participant Service as MetricsService
participant PromAPI as Prometheus API
CLI->>Loader: run_strategy(prom_url?, gcp_anthos)
Loader->>Loader: inspect URL for monitoring.googleapis.com
alt URL contains monitoring.googleapis.com
Loader->>Loader: check gcp_anthos flag
alt gcp_anthos == true
Loader->>Service: select AnthosMetricsService
else
Loader->>Service: select GcpManagedPrometheusMetricsService
end
else
Loader->>Service: select PrometheusMetricsService (or auto-detect)
end
Service->>Service: gather_data(object, LoaderClass, period, step)
alt LoaderClass is PercentileCPULoader
Service->>Service: construct percentile-specific loader (set _percentile)
end
Service->>PromAPI: execute PromQL (GCP/Anthos metric names + label_replace)
PromAPI-->>Service: return metrics
Service-->>Loader: PodsTimeData
Loader-->>CLI: strategy result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (14)
test_gcp_quick.sh (1)
4-4: Validate .env file existence before sourcing.The script sources
.envwithout checking if it exists, which could cause the script to fail with a cryptic error.Apply this diff to add validation:
set -e +if [ ! -f .env ]; then + echo -e "${RED}Error: .env file not found${NC}" + exit 1 +fi source .envrobusta_krr/core/integrations/prometheus/metrics/gcp/__init__.py (1)
13-20: Optional: Sort all alphabetically.The static analyzer suggests applying isort-style sorting to
__all__for consistency, though this is purely a style preference.Apply this diff if you want to follow the convention:
__all__ = [ + "GcpCPUAmountLoader", "GcpCPULoader", - "GcpPercentileCPULoader", - "GcpCPUAmountLoader", - "GcpMemoryLoader", "GcpMaxMemoryLoader", "GcpMemoryAmountLoader", + "GcpMemoryLoader", + "GcpPercentileCPULoader", ]docs/gcp-managed-prometheus-integration.md (1)
34-41: Add language specifiers to fenced code blocks.The static analyzer flags two fenced code blocks without language specifiers. While not critical, adding language hints improves syntax highlighting and readability.
Apply this diff:
-``` +```text https://monitoring.googleapis.com/v1/projects/{PROJECT_ID}/location/global/prometheusFor example:
-+text
https://monitoring.googleapis.com/v1/projects/sicraweb-evo-dev/location/global/prometheustests/test_anthos_loaders.py (1)
52-66: Consider adding tests for empty‑pod selector behavior.All Anthos loader tests use
sample_objectwith a non‑emptypodslist. Since the loaders build apod_nameregex fromobject.pods(with wildcard fallbacks in some implementations), adding a case withpods=[]and asserting the resultingpod_nameselector (e.g.".*") would lock in the intended empty‑pod behavior and guard against regressions when pod discovery returns no pods.robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py (1)
21-33: AlignAnthosCPULoaderpod selector with other loaders for empty‑pod cases.Here
pods_selectoris just the join of pod names; ifobject.podsis empty, the query ends up withpod_name=~""and returns no data, whereas the percentile and amount loaders already use anor ".*"fallback. To keep Anthos CPU loaders consistent and match the documented empty‑pod fallback, consider:- def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: + pods_selector = "|".join(pod.name for pod in object.pods) or ".*"This preserves current behavior when pods are present and avoids surprising “no data” when pod discovery returns an empty list.
tests/test_gcp_loaders.py (1)
51-66: Optionally add coverage for empty‑pod queries.All tests build
sample_k8s_objectwith two pods, so thepod_nameregex is never exercised for thepods=[]case. Given several GCP loaders use anor ".*"fallback for empty pod lists, a small test withpods=[]asserting the resultingpod_nameselector (e.g.".*") would help pin down the intended behavior and catch regressions in pod‑discovery edge cases.robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (1)
24-39: Consider wildcard fallback forGcpMemoryLoaderpod selector.
GcpMemoryLoaderbuildspods_selectorfromobject.podswithout a fallback, so an empty pod list yieldspod_name=~""and no data. Other loaders in this module (GcpMaxMemoryLoader,GcpMemoryAmountLoader) and the CPU side already useor ".*"to handle empty‑pod cases more gracefully. To make behavior consistent:- def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: + pods_selector = "|".join(pod.name for pod in object.pods) or ".*"This keeps current behavior when pods are present while avoiding silent “no data” when the pod list is empty.
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/__init__.py (1)
21-28: Align__all__with Ruff’s ordering expectationsRuff flags
__all__as unsorted (RUF022). If you want a clean lint run, consider reordering entries in a stable/alphabetical way, e.g.:-__all__ = [ - "AnthosCPULoader", - "AnthosPercentileCPULoader", - "AnthosCPUAmountLoader", - "AnthosMemoryLoader", - "AnthosMaxMemoryLoader", - "AnthosMemoryAmountLoader", -] +__all__ = [ + "AnthosCPUAmountLoader", + "AnthosCPULoader", + "AnthosMaxMemoryLoader", + "AnthosMemoryAmountLoader", + "AnthosMemoryLoader", + "AnthosPercentileCPULoader", +]robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py (1)
19-37: Use empty‑pod fallback and mark unused duration/step explicitlyRuff is correct that
durationandstepare unused here (ARG002), andpods_selectorbehaves differently from the other Anthos loaders when there are no pods.If this loader is meant to be an instant query (no
[duration]window), you can keep the signature but:
- Apply the
".*"fallback for an empty pod list, for consistency.- Explicitly discard
duration/stepto satisfy lint and make intent clear.Example:
def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + pods_selector = "|".join(pod.name for pod in object.pods) or ".*" cluster_label = self.get_prometheus_cluster_label() + # duration/step are unused for this instant query but kept for the common interface + del duration, steprobusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py (2)
45-54: AnnotateLOADER_MAPPINGas aClassVarfor RuffRuff’s RUF012 suggestion makes sense here:
LOADER_MAPPINGis a shared configuration, not per‑instance state.You can make this explicit:
-from typing import Optional, Dict, Any, List +from typing import Optional, Dict, Any, List, ClassVar @@ - LOADER_MAPPING: Dict[str, Optional[type[PrometheusMetric]]] = { + LOADER_MAPPING: ClassVar[Dict[str, Optional[type[PrometheusMetric]]]] = {
84-96: Handle unused parameters and drop unnecessary f‑string inload_pods
objectandperiodare unused (ARG002), and the debug log uses an f‑string without placeholders (F541).You can keep the signature while satisfying lint:
async def load_pods(self, object: K8sObjectData, period: timedelta) -> List[PodData]: @@ - logger.debug(f"Anthos: Using Kubernetes API for pod discovery (kube-state-metrics not available)") - return [] + # Parameters are unused here; kept to satisfy the abstract interface. + del object, period + logger.debug("Anthos: Using Kubernetes API for pod discovery (kube-state-metrics not available)") + return []robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (2)
21-37: Explicitly discard unuseddurationinGcpCPULoader.get_query
durationis part of the common interface but unused in this query (onlystepis used inside therate()window). Ruff’s ARG002 is accurate.You can keep behavior and silence the warning:
def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: pods_selector = "|".join(pod.name for pod in object.pods) cluster_label = self.get_prometheus_cluster_label() + # duration is unused for this loader; kept for interface compatibility + del duration
48-89: Consider marking percentile CPU loaders as range queriesIf your
PrometheusMetricmachinery usesquery_typeto decide between/queryand/query_range, the percentile loader should likely behave like a range query (similar to the base CPU loader).You can make that explicit on the inner class:
class _GcpPercentileCPULoader(PrometheusMetric): - # Store percentile as class attribute for later retrieval - _percentile = percentile + # Store percentile as class attribute for later retrieval + _percentile = percentile + query_type: QueryType = QueryType.QueryRangeThis keeps the existing factory pattern but aligns intent with the use of
quantile_over_time([...:{step}]).robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py (1)
42-52: Clean up minor lint issues in GCP metrics serviceA few small tweaks will satisfy Ruff/Flake8 without changing behavior:
- Unused
api_clientin discovery (ARG002)def find_metrics_url(self, *, api_client: Optional[ApiClient] = None) -> Optional[str]: @@ - logger.debug("GCP Managed Prometheus auto-discovery not supported. Use --prometheus-url flag.") - return None + # api_client is unused here; kept for interface compatibility + del api_client + logger.debug("GCP Managed Prometheus auto-discovery not supported. Use --prometheus-url flag.") + return None
- Annotate
LOADER_MAPPINGas aClassVar(RUF012)-from typing import Optional, Dict, Any +from typing import Optional, Dict, Any, ClassVar @@ - LOADER_MAPPING = { + LOADER_MAPPING: ClassVar[Dict[str, Optional[type[PrometheusMetric]]]] = {
- Remove unnecessary f‑string in logger (F541)
- logger.info(f"Using GCP metric naming: kubernetes.io/container/cpu/core_usage_time and kubernetes.io/container/memory/used_bytes") + logger.info("Using GCP metric naming: kubernetes.io/container/cpu/core_usage_time and kubernetes.io/container/memory/used_bytes")
- (Optional) Long
MetricsNotFoundmessage (TRY003)The long inline error message is very readable; if you want to satisfy TRY003 strictly, consider extracting it into a helper or constant, but it’s not functionally problematic.
Also applies to: 68-76, 88-88, 101-108
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.gitignore(1 hunks)CHANGES_GCP.md(1 hunks)docs/gcp-managed-prometheus-integration.md(1 hunks)robusta_krr/core/integrations/prometheus/loader.py(2 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/README.md(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/__init__.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/__init__.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py(1 hunks)robusta_krr/core/models/config.py(1 hunks)robusta_krr/core/runner.py(1 hunks)robusta_krr/main.py(2 hunks)test_gcp_quick.sh(1 hunks)tests/test_anthos_loaders.py(1 hunks)tests/test_gcp_loaders.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
robusta_krr/core/integrations/prometheus/loader.py (3)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py (1)
GcpManagedPrometheusMetricsService(54-170)robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py (1)
AnthosMetricsService(31-145)robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py (1)
PrometheusMetricsService(53-404)
robusta_krr/core/runner.py (1)
robusta_krr/core/models/objects.py (1)
add_warning(60-61)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py (6)
robusta_krr/utils/service_discovery.py (1)
MetricsServiceDiscovery(89-92)robusta_krr/core/integrations/prometheus/metrics/base.py (1)
PrometheusMetric(40-261)robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (1)
GcpCPULoader(13-45)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (1)
GcpMemoryLoader(16-45)robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py (1)
PrometheusMetricsService(53-404)robusta_krr/core/integrations/prometheus/loader.py (1)
gather_data(112-136)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/__init__.py (2)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py (3)
AnthosCPULoader(13-41)AnthosPercentileCPULoader(44-84)AnthosCPUAmountLoader(87-113)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py (3)
AnthosMemoryLoader(12-37)AnthosMaxMemoryLoader(40-68)AnthosMemoryAmountLoader(71-97)
tests/test_gcp_loaders.py (5)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (6)
GcpCPULoader(13-45)GcpPercentileCPULoader(48-89)GcpCPUAmountLoader(92-118)get_query(21-45)get_query(60-84)get_query(97-118)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (6)
GcpMemoryLoader(16-45)GcpMaxMemoryLoader(48-74)GcpMemoryAmountLoader(77-103)get_query(24-45)get_query(53-74)get_query(82-103)robusta_krr/core/models/objects.py (2)
K8sObjectData(38-107)PodData(14-19)robusta_krr/core/models/allocations.py (1)
ResourceAllocations(52-106)robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py (2)
name(157-159)GcpManagedPrometheusMetricsService(54-170)
🪛 Flake8 (7.3.0)
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py
[error] 95-95: f-string is missing placeholders
(F541)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py
[error] 8-8: 'typing.Optional' imported but unused
(F401)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py
[error] 88-88: f-string is missing placeholders
(F541)
🪛 markdownlint-cli2 (0.18.1)
docs/gcp-managed-prometheus-integration.md
34-34: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
CHANGES_GCP.md
47-47: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
52-52: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
62-62: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
338-338: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
347-347: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
560-560: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
610-610: Multiple headings with the same content
(MD024, no-duplicate-heading)
629-629: Multiple headings with the same content
(MD024, no-duplicate-heading)
639-639: Multiple headings with the same content
(MD024, no-duplicate-heading)
657-657: Multiple headings with the same content
(MD024, no-duplicate-heading)
🪛 Ruff (0.14.5)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py
19-19: Unused method argument: duration
(ARG002)
19-19: Unused method argument: step
(ARG002)
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py
45-54: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
84-84: Unused method argument: object
(ARG002)
84-84: Unused method argument: period
(ARG002)
95-95: f-string without any placeholders
Remove extraneous f prefix
(F541)
robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py
24-24: Unused method argument: duration
(ARG002)
24-24: Unused method argument: step
(ARG002)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py
21-21: Unused method argument: duration
(ARG002)
53-53: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py
42-42: Unused method argument: api_client
(ARG002)
68-76: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
88-88: f-string without any placeholders
Remove extraneous f prefix
(F541)
101-108: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics/gcp/__init__.py
13-20: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/__init__.py
21-28: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py
21-21: Unused method argument: duration
(ARG002)
54-54: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (14)
.gitignore (1)
6-7: LGTM!The new ignore patterns appropriately exclude test artifacts and demo files from version control.
test_gcp_quick.sh (1)
73-85: LGTM!The krr.py invocation correctly constructs the command with conditional flags and proper parameter passing. The use of variables for dynamic flag construction is clean and maintainable.
robusta_krr/core/integrations/prometheus/loader.py (2)
20-21: LGTM!The imports for GCP and Anthos metrics services are correctly added and align with the new service classes introduced in the PR.
56-69: LGTM!The service selection logic correctly detects GCP Managed Prometheus URLs and chooses between Anthos and GCP services based on the
gcp_anthosconfiguration flag. The fallback to standard PrometheusMetricsService for non-GCP URLs maintains backward compatibility.robusta_krr/core/models/config.py (1)
52-52: LGTM!The
gcp_anthosconfiguration field is well-defined with a sensible default (False) and clear description. This aligns with the CLI flag and enables Anthos-specific metric loading.robusta_krr/core/runner.py (2)
357-360: Logging level change is appropriate.Downgrading from
logger.warningtologger.debugfor Prometheus pod discovery fallback is appropriate, especially for Anthos environments where Kubernetes API pod discovery is the expected behavior.
353-360: The review comment is incorrect and should be disregarded."NoPrometheusPods" is a valid member of the
PodWarningLiteral type defined atrobusta_krr/core/models/objects.pylines 31-35. In Python's type system,Literal["NoPrometheusPods", ...]accepts direct string literals matching the defined values. The code atrobusta_krr/core/runner.py:356correctly passes"NoPrometheusPods"as a string literal toobject.add_warning(), which expects aPodWarningtype. This is not a type error—type checkers like mypy understand this pattern. The reviewer appears to have confused Literal types with Enum types (which requireEnum.VALUEsyntax).Likely an incorrect or invalid review comment.
robusta_krr/core/integrations/prometheus/metrics/gcp/__init__.py (1)
1-20: LGTM!The package initialization correctly exports all GCP metric loaders with a clear docstring explaining the GCP-specific metric naming conventions.
robusta_krr/main.py (2)
204-209: LGTM!The
--gcp-anthosCLI flag is well-defined with appropriate defaults and clear help text. It's correctly placed in the "Prometheus GCP Settings" help panel.
389-389: LGTM!The
gcp_anthosparameter is correctly passed from the CLI to the Config constructor, enabling the feature throughout the application.docs/gcp-managed-prometheus-integration.md (1)
1-232: Comprehensive documentation!The documentation provides clear guidance on GCP Managed Prometheus integration, including authentication, URL patterns, usage examples, architecture overview, limitations, and troubleshooting. This will greatly help users adopting the new GCP/Anthos support.
robusta_krr/core/integrations/prometheus/metrics/gcp/README.md (1)
1-129: GCP/Anthos loader documentation looks consistent and complete.The metric mappings, UTF‑8 PromQL example, label renaming, loader list, limitations, and service usage all align with the implementations and tests in this PR. No changes needed from my side.
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py (1)
31-42: Remove the "Memory aggregation" bullet from the docstring – both GCP and Anthos use identical aggregation functions.The docstring contains a self-contradictory and misleading bullet point. However, the suggested fix in the review is incorrect: GCP does not use
maxfor memory aggregation. Both GcpMaxMemoryLoader and AnthosMaxMemoryLoader use identicalmax_over_time(max(...))functions. The actual differences between GCP and Anthos are only:
- Metric prefix:
kubernetes.io/container/*vskubernetes.io/anthos/container/*- Monitored resource label:
k8s_container(same for both)The "Memory aggregation" line should either be removed entirely or accurately describe that both use the same aggregation (not a difference).
Likely an incorrect or invalid review comment.
CHANGES_GCP.md (1)
529-529: Consolidate duplicate headings.Multiple headings appear twice in the document (after line 529):
- "### ✅ GCP Cloud Support" (lines 13, 529)
- "🚀 Quick Test" (lines 284, 569)
- "🔍 Technical Highlights" (lines 271, 629)
- "🐛 Debugging" (lines 328, 639)
- "📋 Changelog" (lines 478, 657)
Since lines 529–667 are duplicates to be removed, this resolves the duplication automatically.
Also applies to: 610-610, 629-629, 639-639, 657-657
| ``` | ||
| ============================== 75 passed in 5.20s ============================== | ||
| ``` | ||
|
|
||
| ### Test Breakdown | ||
| ``` | ||
| ✅ 75/75 tests passing | ||
| • 10 GCP Cloud tests (new) | ||
| • 10 Anthos tests (new) | ||
| • 55 existing KRR tests | ||
| ✅ No broken tests | ||
| ✅ Production-ready | ||
| ``` | ||
|
|
||
| ### GCP Cloud Tests | ||
| ``` | ||
| tests/test_gcp_loaders.py::TestGcpCPULoader::test_cpu_loader_query_syntax PASSED | ||
| tests/test_gcp_loaders.py::TestGcpCPULoader::test_cpu_loader_with_cluster_label PASSED | ||
| tests/test_gcp_loaders.py::TestGcpCPULoader::test_percentile_cpu_loader_factory PASSED | ||
| tests/test_gcp_loaders.py::TestGcpCPULoader::test_percentile_cpu_loader_invalid_percentile PASSED | ||
| tests/test_gcp_loaders.py::TestGcpCPULoader::test_cpu_amount_loader_query PASSED | ||
| tests/test_gcp_loaders.py::TestGcpMemoryLoader::test_memory_loader_query_syntax PASSED | ||
| tests/test_gcp_loaders.py::TestGcpMemoryLoader::test_max_memory_loader_query PASSED | ||
| tests/test_gcp_loaders.py::TestGcpMemoryLoader::test_memory_amount_loader_query PASSED | ||
| tests/test_gcp_loaders.py::TestQuerySyntaxValidation::test_no_syntax_errors_in_queries PASSED | ||
| tests/test_gcp_loaders.py::TestGcpMetricsService::test_loader_mapping PASSED | ||
| ``` | ||
|
|
||
| ### Anthos Tests | ||
| ``` | ||
| tests/test_anthos_loaders.py::TestAnthosCPULoader::test_cpu_loader_uses_anthos_metric PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosCPULoader::test_cpu_loader_with_cluster_label PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosCPULoader::test_percentile_cpu_loader_factory PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosCPULoader::test_percentile_cpu_loader_invalid_percentile PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosCPULoader::test_cpu_amount_loader_query PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosMemoryLoader::test_memory_loader_uses_anthos_metric PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosMemoryLoader::test_max_memory_loader_query PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosMemoryLoader::test_memory_amount_loader_query PASSED | ||
| tests/test_anthos_loaders.py::TestQuerySyntaxValidation::test_no_syntax_errors_in_queries PASSED | ||
| tests/test_anthos_loaders.py::TestAnthosMetricsService::test_loader_mapping PASSED | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add language specifiers to fenced code blocks.
Several code blocks lack language identifiers, which impacts readability and syntax highlighting. Apply this pattern to all code blocks:
-```
+```bash
# Your code here
-```
+```Affected line ranges: 47, 52, 62, 76, 338, 347, 560. For shell commands use bash, for Python use python, for PromQL use promql.
Also applies to: 338-372, 560-590
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
47-47: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
52-52: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
62-62: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In CHANGES_GCP.md around lines 47-87 and the broader affected ranges 338-372 and
560-590, several fenced code blocks are missing language specifiers; update each
fenced block in those ranges to include the appropriate language marker (use
bash for console/shell/test-runner output blocks and any shell commands, python
for Python code snippets/tests, and promql for PromQL queries) so that each ```
becomes ```bash, ```python, or ```promql as appropriate throughout the listed
line ranges.
| ### ✅ GCP Cloud Support | ||
|
|
||
| | File | Type | Changes | | ||
| |------|------|---------| | ||
| | `robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py` | Fixed | • Saved `_percentile` as class attribute<br>• Fixed `cluster_label` UTF-8 syntax | | ||
| | `robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py` | Fixed | • Fixed `cluster_label` UTF-8 syntax | | ||
| | `robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py` | Enhanced | • Removed regex parsing<br>• Explicit `MaxOOMKilledMemoryLoader` handling<br>• Detailed logging | | ||
| | `tests/test_gcp_loaders.py` | New | • 10 unit tests for GCP loaders | | ||
|
|
||
| ### ✅ Anthos Support (New) | ||
|
|
||
| | File | Type | Purpose | | ||
| |------|------|---------| | ||
| | `robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py` | New | • CPU loaders for Anthos metrics<br>• Uses `kubernetes.io/anthos/container/*` | | ||
| | `robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py` | New | • Memory loaders for Anthos<br>• Uses `max_over_time()` aggregation | | ||
| | `robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py` | New | • Service orchestrator for Anthos<br>• Kubernetes API pod discovery | | ||
| | `robusta_krr/core/models/config.py` | Modified | • Added `gcp_anthos: bool` field | | ||
| | `robusta_krr/main.py` | Modified | • Added `--gcp-anthos` CLI flag | | ||
| | `robusta_krr/core/runner.py` | Modified | • Changed pod discovery fallback to DEBUG level | | ||
| | `tests/test_anthos_loaders.py` | New | • 10 unit tests for Anthos loaders | | ||
|
|
||
| ### 📚 Documentation | ||
|
|
||
| | File | Type | Content | | ||
| |------|------|---------| | ||
| | `docs/gcp-managed-prometheus-integration.md` | Updated | • Complete GCP & Anthos integration guide | | ||
| | `ANTHOS_IMPLEMENTATION.md` | New | • Detailed Anthos architecture & usage | | ||
| | `robusta_krr/core/integrations/prometheus/metrics/gcp/README.md` | Updated | • GCP loaders documentation | | ||
|
|
||
| ## 🧪 Test Status | ||
|
|
||
| ``` | ||
| ✅ 75/75 tests passing | ||
| • 10 GCP Cloud tests | ||
| • 10 Anthos tests | ||
| • 55 existing KRR tests | ||
| ✅ No broken tests | ||
| ✅ Production-ready | ||
| ``` | ||
|
|
||
| ## 🚀 Quick Test | ||
|
|
||
| ### Unit Tests | ||
| ```bash | ||
| # All tests | ||
| poetry run pytest tests/ -v | ||
|
|
||
| # GCP Cloud tests only | ||
| poetry run pytest tests/test_gcp_loaders.py -v | ||
|
|
||
| # Anthos tests only | ||
| poetry run pytest tests/test_anthos_loaders.py -v | ||
| ``` | ||
|
|
||
| ### Integration Tests (requires GCP access) | ||
| ```bash | ||
| # GCP Cloud cluster | ||
| ./test_gcp_quick.sh infra-contabilita | ||
|
|
||
| # Anthos cluster | ||
| ./test_gcp_quick.sh gke-connect | ||
| ``` | ||
|
|
||
| ## 📖 Key Features | ||
|
|
||
| ### GCP Cloud (kubernetes.io/container/*) | ||
| - ✅ Auto-detected from `monitoring.googleapis.com` URL | ||
| - ✅ UTF-8 PromQL syntax with quoted labels | ||
| - ✅ Label renaming: `pod_name`→`pod`, `container_name`→`container` | ||
| - ✅ All metric types: CPU (rate, percentile, amount), Memory (current, max, amount) | ||
| - ✅ Cluster label support for multi-cluster projects | ||
| - ⚠️ MaxOOMKilledMemoryLoader not supported (returns empty data) | ||
|
|
||
| ### Anthos (kubernetes.io/anthos/container/*) | ||
| - ✅ Enabled via `--gcp-anthos` flag | ||
| - ✅ Dedicated loaders for Anthos-specific metrics | ||
| - ✅ Uses `max_over_time()` for memory (Anthos convention) | ||
| - ✅ Kubernetes API pod discovery (no kube-state-metrics) | ||
| - ✅ Label renaming same as GCP Cloud | ||
| - ⚠️ No cluster summary metrics (expected for Anthos) | ||
|
|
||
| ## 🎯 Usage Examples | ||
|
|
||
| ### GCP Cloud | ||
| ```bash | ||
| krr simple \ | ||
| --prometheus-url="https://monitoring.googleapis.com/v1/projects/PROJECT_ID/location/global/prometheus" \ | ||
| --prometheus-auth-header="Bearer $(gcloud auth print-access-token)" \ | ||
| --namespace=your-namespace | ||
| ``` | ||
|
|
||
| ### Anthos | ||
| ```bash | ||
| krr simple \ | ||
| --prometheus-url="https://monitoring.googleapis.com/v1/projects/PROJECT_ID/location/global/prometheus" \ | ||
| --prometheus-auth-header="Bearer $(gcloud auth print-access-token)" \ | ||
| --gcp-anthos \ | ||
| --namespace=your-namespace | ||
| ``` | ||
|
|
||
| ## 🔍 Technical Highlights | ||
|
|
||
| | Feature | GCP Cloud | Anthos | Implementation | | ||
| |---------|-----------|--------|----------------| | ||
| | **Metrics** | `kubernetes.io/container/*` | `kubernetes.io/anthos/container/*` | Separate loader classes | | ||
| | **Pod Discovery** | Prometheus (kube-state-metrics) | Kubernetes API only | `load_pods()` override | | ||
| | **Memory Aggregation** | `max_over_time()` | `max_over_time()` | Different query templates | | ||
| | **Label Format** | `pod_name`, `container_name` | `pod_name`, `container_name` | Same `label_replace()` logic | | ||
| | **Auto-detection** | URL-based | Requires `--gcp-anthos` flag | Loader selection in service | | ||
|
|
||
| ## 🐛 Debugging | ||
|
|
||
| ### Enable Debug Logging | ||
| ```bash | ||
| krr simple --log-level=debug --gcp-anthos ... | ||
| ``` | ||
|
|
||
| ### Test Prometheus Connectivity | ||
| ```bash | ||
| # GCP Cloud | ||
| curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ | ||
| "https://monitoring.googleapis.com/v1/projects/PROJECT_ID/location/global/prometheus/api/v1/query?query=sum(rate({\"__name__\"=\"kubernetes.io/container/cpu/core_usage_time\"}[5m]))" | ||
|
|
||
| # Anthos | ||
| curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \ | ||
| "https://monitoring.googleapis.com/v1/projects/PROJECT_ID/location/global/prometheus/api/v1/query?query=sum(rate({\"__name__\"=\"kubernetes.io/anthos/container/cpu/core_usage_time\"}[5m]))" | ||
| ``` | ||
|
|
||
| ## 📋 Changelog | ||
|
|
||
| **2025-11-20** - Complete GCP & Anthos implementation | ||
| - ✅ Fixed GCP Cloud loaders (percentile attribute, cluster label, UTF-8 syntax) | ||
| - ✅ Implemented full Anthos support with dedicated loaders | ||
| - ✅ Added `--gcp-anthos` CLI flag | ||
| - ✅ Created comprehensive test suites (20 new tests) | ||
| - ✅ Updated all documentation to English | ||
| - ✅ Changed pod discovery fallback logging to DEBUG level | ||
| - ✅ All 75 tests passing | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove substantial content duplication.
Lines 529-667 duplicate content from earlier sections (file tables, test results, quick test guide, technical highlights, debugging, and changelog). This appears to be accidental and should be removed to maintain document clarity and avoid confusion.
The file should logically conclude at Line 527 with the support section.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
560-560: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
610-610: Multiple headings with the same content
(MD024, no-duplicate-heading)
629-629: Multiple headings with the same content
(MD024, no-duplicate-heading)
639-639: Multiple headings with the same content
(MD024, no-duplicate-heading)
657-657: Multiple headings with the same content
(MD024, no-duplicate-heading)
🤖 Prompt for AI Agents
In CHANGES_GCP.md around lines 529 to 667 there is a duplicated block (file
tables, test results, quick test guide, technical highlights, debugging, and
changelog) that repeats earlier content; remove the entire duplicated section
(delete lines 529–667 inclusive) so the document logically concludes at line
527, ensure the file ends cleanly with a single trailing newline and that any
preceding references (e.g., links or TOC) remain valid after removal.
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py (1)
8-115: Drop unusedOptionalimport; optional consistency tweaks
Optionalis still unused at Line 9; this will keep tripping F401. You can safely remove it:-import logging -from typing import Optional - -from robusta_krr.core.models.objects import K8sObjectData -from ...base import PrometheusMetric +import logging + +from robusta_krr.core.models.objects import K8sObjectData +from ...base import PrometheusMetric
- For
AnthosCPULoader.get_query, consider mirroring other loaders’ behavior:- def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + def get_query(self, object: K8sObjectData, _duration: str, step: str) -> str: + pods_selector = "|".join(pod.name for pod in object.pods) or ".*"This both avoids empty
pod_name=~""selectors and quiets Ruff’s unused-argument warning.robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py (1)
1-50: Fix Anthos memory docstrings and align pods selector behaviorTwo issues here:
- The module and
AnthosMemoryLoaderdocstrings talk about “max_over_time aggregation” and “different from GKE's max_over_time”, but the implementation uses a simplemax(...)(same aggregation pattern as the GCP loaders). The real difference is the metric namespace (kubernetes.io/anthos/container/*vskubernetes.io/container/*). Consider updating the text, e.g.:- Anthos uses kubernetes.io/anthos/container/* metrics with max_over_time - aggregation for memory (different from GKE's max_over_time). + Anthos uses kubernetes.io/anthos/container/* metrics for memory, + with the same max/max_over_time aggregation patterns as GKE, but + a different metric namespace. @@ - Uses max_over_time aggregation as per Anthos convention. + Uses `max(...)` over kubernetes.io/anthos/container/memory/used_bytes, + matching the GKE aggregation but on the Anthos metric namespace.
- As with other base loaders,
AnthosMemoryLoader.get_querycould be made more robust and lint-clean:- def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + def get_query(self, object: K8sObjectData, _duration: str, _step: str) -> str: + pods_selector = "|".join(pod.name for pod in object.pods) or ".*"
🧹 Nitpick comments (2)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (1)
18-111: Percentile factory wiring looks correct; consider minor consistency tweaks
The
GcpPercentileCPULoaderfactory correctly:
- Validates the percentile.
- Stores it on
_percentilesoGcpManagedPrometheusMetricsServicecan introspect it.- Renames
__name__/__qualname__to"PercentileCPULoader"so the loader name stays consistent across backends.For
GcpCPULoader.get_query, you might want to mirror the pod-selector robustness from the percentile/amount loaders:- def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + def get_query(self, object: K8sObjectData, _duration: str, step: str) -> str: + pods_selector = "|".join(pod.name for pod in object.pods) or ".*"This keeps behavior sane if
object.podsis empty and silences Ruff’s ARG002 on the unuseddurationparameter.robusta_krr/strategies/simple.py (1)
26-107: CPU percentile logging behavior looks good; tiny flag-name nitThe
_cpu_percentile_loggedguard and_log_cpu_percentile_usage()hook inmetricsare a clean way to surface backend/percentile configuration without log spam, and using globalsettingsforprometheus_url/gcp_anthosis consistent with the rest of the codebase.Minor nit: the log message says
flag --cpu-percentile, while the description string below still shows--cpu_percentile. It may be worth aligning those flag names to avoid confusing users.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
robusta_krr/core/integrations/prometheus/metrics/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py(1 hunks)robusta_krr/strategies/simple.py(3 hunks)test_gcp_quick.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test_gcp_quick.sh
🧰 Additional context used
🧬 Code graph analysis (3)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (2)
robusta_krr/core/models/objects.py (1)
K8sObjectData(38-107)robusta_krr/core/integrations/prometheus/metrics/base.py (2)
PrometheusMetric(40-261)QueryType(27-29)
robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (3)
robusta_krr/core/models/objects.py (1)
K8sObjectData(38-107)robusta_krr/core/integrations/prometheus/metrics/base.py (1)
PrometheusMetric(40-261)robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (3)
get_query(26-58)get_query(73-106)get_query(119-148)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py (2)
robusta_krr/core/integrations/prometheus/metrics/base.py (1)
PrometheusMetric(40-261)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (3)
get_query(29-58)get_query(66-95)get_query(103-132)
🪛 Flake8 (7.3.0)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py
[error] 9-9: 'typing.Optional' imported but unused
(F401)
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py
[error] 95-95: f-string is missing placeholders
(F541)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py
[error] 89-89: f-string is missing placeholders
(F541)
🪛 Ruff (0.14.5)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py
26-26: Unused method argument: duration
(ARG002)
66-66: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py
45-54: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
84-84: Unused method argument: object
(ARG002)
84-84: Unused method argument: period
(ARG002)
95-95: f-string without any placeholders
Remove extraneous f prefix
(F541)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py
26-26: Unused method argument: duration
(ARG002)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py
29-29: Unused method argument: duration
(ARG002)
29-29: Unused method argument: step
(ARG002)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py
42-42: Unused method argument: api_client
(ARG002)
68-76: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
89-89: f-string without any placeholders
Remove extraneous f prefix
(F541)
102-109: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py
24-24: Unused method argument: duration
(ARG002)
24-24: Unused method argument: step
(ARG002)
🔇 Additional comments (1)
robusta_krr/core/integrations/prometheus/metrics/cpu.py (1)
30-41: Expose_percentileon loader class for backend mapping (LGTM)Storing the percentile on the inner
PercentileCPULoaderclass is exactly what the GCP/Anthos metrics services need to derive the correct quantile; no further changes required here.
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py
Show resolved
Hide resolved
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (1)
21-58: GCP memory loaders align with CPU loaders and resolve earlier lint/edge casesThe memory loaders correctly use
kubernetes.io/container/memory/used_bytes, apply the pods selector fallback, and perform the same label renaming as the CPU loaders. Switching to_duration/_stepinGcpMemoryLoader.get_queryand addingor ".*"on the selector addresses the prior unused-argument and empty-pod cases cleanly.Also applies to: 61-96, 98-132
🧹 Nitpick comments (4)
robusta_krr/strategies/simple.py (1)
1-1: CPU percentile logging and backend detection look solidThe module logger, import of global
settings, one-time_cpu_percentile_loggedguard, and backend detection viaprometheus_url/gcp_anthosare straightforward and safe, and the hook frommetricsensures the log is emitted lazily on first use.Minor optional: if
metricscan be accessed from multiple threads, there’s a tiny race on_cpu_percentile_loggedthat could lead to duplicate log lines only; if that matters, you could guard it with a simple lock or move the log to a process‑level place.Also applies to: 26-27, 29-29, 83-107
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py (1)
56-105: Percentile and amount loaders are consistent; percentile guard is fineThe percentile factory and CPU amount loader correctly mirror the GCP implementations (pods selector fallback,
[duration:step]subquery, label_replace chain, and percentile validation). The ValueError message on invalid percentiles is clear; Ruff’s TRY003 here is purely stylistic and doesn’t need action unless you want to introduce a shared validation helper for both GCP and Anthos.Also applies to: 108-142
robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py (1)
31-52: Anthos metrics service mapping and overrides look correctThe Anthos loader mapping,
load_podsoverride (forcing Kubernetes API discovery), and thegather_datainterception (including the percentile factory handling and the direct call toPrometheusMetricsService.gather_datato avoid double-mapping) all look coherent and in line with the GCP service. Returning{}for unsupported loaders likeMaxOOMKilledMemoryLoaderis a reasonable fallback given the lack of kube-state-metrics on Anthos.Also applies to: 74-97, 98-150
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py (1)
34-52: GCP metrics service mapping and connection checks look correct; minor logging tweak optionalThe discovery stub, loader mapping (including special handling for
PercentileCPULoadervia the_percentileattribute), and thegather_dataoverride all look consistent with the rest of the Prometheus integration and should give predictable behavior on GCP. Thecheck_connectionwrapper aroundMetricsNotFoundis functionally fine; if you want richer diagnostics and to satisfy Ruff’s TRY400, you could switch thelogger.error(...)in the except block tologger.exception(...)to capture the stack trace when connection fails.Also applies to: 54-92, 94-116, 117-167, 168-182
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics_service/anthos_metrics_service.py(1 hunks)robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py(1 hunks)robusta_krr/strategies/simple.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/memory.py
🧰 Additional context used
🧬 Code graph analysis (3)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py (6)
robusta_krr/utils/service_discovery.py (1)
MetricsServiceDiscovery(89-92)robusta_krr/core/integrations/prometheus/metrics/base.py (1)
PrometheusMetric(40-261)robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (1)
GcpCPULoader(18-58)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (3)
GcpMemoryLoader(21-58)GcpMaxMemoryLoader(61-95)GcpMemoryAmountLoader(98-132)robusta_krr/core/integrations/prometheus/metrics_service/prometheus_metrics_service.py (1)
PrometheusMetricsService(53-404)robusta_krr/core/integrations/prometheus/loader.py (1)
gather_data(112-136)
robusta_krr/strategies/simple.py (1)
robusta_krr/core/integrations/prometheus/metrics/base.py (1)
PrometheusMetric(40-261)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (5)
robusta_krr/core/models/objects.py (1)
K8sObjectData(38-107)robusta_krr/core/integrations/prometheus/metrics/base.py (2)
PrometheusMetric(40-261)QueryType(27-29)robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py (3)
get_query(25-53)get_query(70-103)get_query(113-142)robusta_krr/core/integrations/prometheus/metrics/gcp/memory.py (3)
get_query(29-58)get_query(66-95)get_query(103-132)robusta_krr/core/integrations/prometheus/metrics/cpu.py (3)
get_query(13-27)get_query(41-59)get_query(69-84)
🪛 Ruff (0.14.5)
robusta_krr/core/integrations/prometheus/metrics_service/gcp_metrics_service.py
105-112: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
113-115: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py
65-65: Avoid specifying long messages outside the exception class
(TRY003)
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py
67-67: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (3)
robusta_krr/strategies/simple.py (1)
128-128: CLI flag name in description is consistentUpdating the example to
--cpu-percentilealigns with the logging message and the intended CLI flag style; no further changes needed here.robusta_krr/core/integrations/prometheus/metrics/gcp/anthos/cpu.py (1)
17-53: AnthosCPULoader query construction looks solidPods selector fallback, cluster label injection, and label renaming (pod_name/container_name → pod/container) all mirror the GCP loaders and existing CPU loader patterns. I don’t see any functional issues here; this should behave as expected on Anthos.
robusta_krr/core/integrations/prometheus/metrics/gcp/cpu.py (1)
18-59: GCP CPU loaders and percentile factory wiring look goodCPU usage, percentile, and amount loaders all follow the expected GCP metric/label conventions, include the pods selector fallback, and correctly rename
pod_name/container_namefor compatibility. The percentile factory’s validation and__name__/__qualname__override align with the LOADER_MAPPING‑by‑name pattern in the metrics service, so the mapping should behave correctly.Also applies to: 61-111, 114-148
GCP Managed Prometheus & Anthos - Implementation Guide
📋 Executive Summary
This implementation extends KRR (Kubernetes Resource Recommender) to support GCP Managed Prometheus and Anthos Managed Prometheus, which use different naming conventions compared to standard Prometheus.
Baseline Commit:
c3e1c481d9e887c4a0b23bac0ccfd10a9629faabStatus: ✅ PRODUCTION READY
Date: November 2024
Test Coverage: 20 new GCP/Anthos tests (100% passing)
🎯 Problem Solved
Before Implementation
KRR only supported standard Prometheus with metrics like:
container_cpu_usage_seconds_totalcontainer_memory_working_set_bytesGCP Managed Prometheus Uses Different Metrics:
container_cpu_usage_seconds_totalkubernetes.io/container/cpu/core_usage_timecontainer_memory_working_set_byteskubernetes.io/container/memory/used_bytesnamespacenamespace_namepodpod_namecontainercontainer_nameAnthos Adds Another Layer:
kubernetes.io/anthos/container/*instead ofkubernetes.io/container/*🏗️ Solution Architecture
Pattern: Factory + Inheritance + Dynamic Mapping
Key Problem Solved: Factory Pattern
The main problem was with
PercentileCPULoaderwhich is a factory function:Result: Warning "No GCP mapping found for _GcpPercentileCPULoader"
Result: Correct mapping, no warnings!
📂 Files Created/Modified
✅ New GCP Cloud Files (6 loaders)
✅ New Anthos Files (6 loaders)
✅ Metrics Services
🔧 Modified Files
robusta_krr/core/integrations/prometheus/loader.py--gcp-anthosflagrobusta_krr/core/models/config.pygcp_anthos: boolfieldrobusta_krr/main.py--gcp-anthosCLI flag📚 Documentation
docs/gcp-managed-prometheus-integration.mdCHANGES_GCP.mdtest_gcp_quick.sh🧪 Test Suite
tests/test_gcp_loaders.pytests/test_anthos_loaders.pyTotal: 84 tests passing ✅ (64 existing + 20 new)
🧬 Technical Details
1️⃣ GCP Managed Prometheus Metrics
Problem: GCP doesn't use standard Prometheus metrics (
container_cpu_usage_seconds_total,container_memory_working_set_bytes).Solution: Mapping from standard metrics → GCP metrics
container_cpu_usage_seconds_totalkubernetes.io/container/cpu/core_usage_timecontainer_memory_working_set_byteskubernetes.io/container/memory/used_bytes2️⃣ Anthos Differences
Anthos adds another layer of difference:
kubernetes.io/container/cpu/core_usage_timekubernetes.io/anthos/container/cpu/core_usage_timekubernetes.io/container/memory/used_byteskubernetes.io/anthos/container/memory/used_bytes3️⃣ UTF-8 PromQL Syntax
GCP requires quoted syntax:
4️⃣ Label Renaming
podpod_namelabel_replace(..., "pod", "$1", "pod_name", "(.*)")containercontainer_namelabel_replace(..., "container", "$1", "container_name", "(.*)")namespacenamespace_namenamespace_name5️⃣ PercentileCPULoader Factory Pattern
Problem: Factory function returns inner class
_GcpPercentileCPULoader→ warning "No GCP mapping for _GcpPercentileCPULoader"Solution: Override
__name__and__qualname__Now
GcpPercentileCPULoader(0.95).__name__ == "PercentileCPULoader"✅🐛 Bug Fixes
Bug 1: Empty Pod Selector
Problem: When
object.pods = [], generates"pod_name"=~""→ invalid PromQL queryApplied to: All 16 loaders (10 GCP + 6 Anthos)
Bug 2: Double Mapping in Anthos
Problem: Anthos inherited
gather_data()from GCP:Result: Anthos → Base class (skips GCP) → No warnings ✅
Bug 3: Duplicate LOADER_MAPPING
Problem: Both GCP and Anthos had duplicate mappings
Solution: Clean separation with DRY principle
🔍 Technical Implementation Details
1. UTF-8 PromQL Syntax (GCP prefered)
Standard Prometheus:
GCP Managed Prometheus:
2. Label Renaming with
label_replace()All GCP/Anthos loaders use
label_replace()for compatibility:Why? Existing KRR code expects
podandcontainerlabels, notpod_nameandcontainer_name.3. Empty Pod Selector Fallback
Problem: When
object.podsis empty, PromQL query generated:Solution: Added fallback in all loaders:
Now generates:
4. Double Mapping Prevention (Anthos)
Initial Problem:
Solution:
This skips the GCP layer and avoids double mapping!
🚀 How It Works (Flow Diagram)
GCP Cloud Flow
Anthos Flow
📊 Feature Comparison
--gcp-anthoskubernetes.io/container/*kubernetes.io/anthos/container/*load_pods()override{}get_cluster_summary()overridemax_over_time()max_over_time()pod_name→podpod_name→pod__name__override__name__overrideNone🧪 Tests Executed
Unit Tests (20 total)
Integration Tests
GCP Cloud (GKE):
Anthos (On-Prem):
🐛 Bug Fixes Applied
1. Factory Pattern Naming Issue
Problem:
_GcpPercentileCPULoaderinstead ofPercentileCPULoaderFix: Added
__name__and__qualname__overrideFiles:
cpu.py(GCP and Anthos)2. Double Mapping in Anthos
Problem: Anthos called
super().gather_data()→ double mappingFix: Calls
PrometheusMetricsService.gather_data()directlyFile:
anthos_metrics_service.py3. Empty Pod Selector
Problem:
"pod_name"=~""generates invalid queryFix:
or ".*"fallback in all 16 loadersFiles: All CPU and Memory loaders (GCP + Anthos)
4. Missing Cluster Label UTF-8
Problem: Cluster label not properly quoted
Fix: Used
{cluster_label}with UTF-8 syntaxFiles: All loaders
📈 Change Statistics
git diff --stat c3e1c481d9e887c4a0b23bac0ccfd10a9629faab..HEAD # Output: 19 files changed, 2428 insertions(+), 5 deletions(-)Breakdown:
✅ Completion Checklist
--gcp-anthosCLI flag🎓 Lessons Learned
1. Factory Pattern in Python
When using factories that create classes dynamically, always override
__name__:2. Inheritance and Method Resolution
When you have 3 levels of inheritance:
And Child needs to skip Parent, use:
3. Empty Collections Need Fallbacks
When building regex from list:
4. DRY Principle with Inheritance
Child class should override, not duplicate parent data:
🔗 Related Resources
🚀 Usage
GCP Cloud (GKE)
krr simple \ --prometheus-url https://monitoring.googleapis.com/v1/projects/PROJECT/location/global/prometheus \ --prometheus-auth-header "$(gcloud auth print-access-token)" \ --namespace NAMESPACEAnthos (On-Prem)
krr simple \ --prometheus-url https://your-anthos-prometheus.com \ --gcp-anthos \ --prometheus-auth-header "$(gcloud auth print-access-token)" \ --namespace NAMESPACE📝 Conclusion
This implementation successfully extends KRR to support both GCP Managed Prometheus and Anthos Managed Prometheus, with:
Status: ✅ COMPLETE AND TESTED