Skip to content

GPU Operator dcgm-exporter metrics not scraped (ServiceMonitor-only); no GPU dashboard ships #64

Description

@stxkxs

Low priority / latent. addons/accelerators/gpu-operator/values.yaml enables dcgmExporter with serviceMonitor.enabled: true and no prometheus.io/scrape annotation. ServiceMonitor is inert in prod (no prometheus-operator), so dcgm GPU metrics (:9400) never reach AMP. Also no GPU dashboard ships today (grep gpu/dcgm/nvidia = 0), and gpu-operator is cluster-label-gated (opt-in), so this only matters once a GPU board is added on a GPU cluster. Fix when adding a GPU dashboard: add podAnnotations prometheus.io/scrape/port(9400)/path to the dcgmExporter, mirroring the addon-scrape PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions