Status | |
---|---|
Stability | beta: metrics |
Distributions | contrib |
Issues | |
Code Owners | @aabmass, @dashpole, @jsuereth, @punya, @damemi, @psx95 |
This exporter can be used to send metrics (including trace exemplars) to Google Cloud Managed Service for Prometheus. It is one of several supported approaches for sending metrics to Google Cloud Managed Service for Prometheus.
The following configuration options are supported:
project
(optional): GCP project identifier.user_agent
(optional): Override the user agent string sent on requests to Cloud Monitoring (currently only applies to metrics). Specify{{version}}
to include the application version number. Defaults toopentelemetry-collector-contrib {{version}}
.metric
(optional): Configuration for sending metrics to Cloud Monitoring.endpoint
(optional): Endpoint where metric data is going to be sent to. Replacesendpoint
.compression
(optional): Compression format for Metrics gRPC requests. Supported values: [gzip
]. Defaults to no compression.grpc_pool_size
(optional): Sets the size of the connection pool in the GCP client. Defaults to a single connection.use_insecure
(optional): If true, disables gRPC client transport security. Only has applies if Endpoint is not "".add_metric_suffixes
(default=true
): Add type and unit suffixes to metrics.extra_metrics_config
(optional): Enable or disable additional metrics.enable_target_info
(default=true
): Addtarget_info
metric based on resource.enable_scope_info
(default=true
): Addotel_scope_info
metric andscope_name
/scope_version
attributes to all other metrics.
resource_filters
(optional): Provides a list of filters to match resource attributes which will be included in metric labels.prefix
(optional): Match resource attribute keys by prefix.regex
(optional): Match resource attribute keys by regex.
sending_queue
(optional): Configuration for how to buffer traces before sending.enabled
(default = true)num_consumers
(default = 10): Number of consumers that dequeue batches; ignored ifenabled
isfalse
queue_size
(default = 1000): Maximum number of batches kept in memory before data; ignored ifenabled
isfalse
; User should calculate this asnum_seconds * requests_per_second
where:num_seconds
is the number of seconds to buffer in case of a backend outagerequests_per_second
is the average number of requests per seconds.
Note: The sending_queue
is provided (and documented) by the Exporter Helper
receivers:
prometheus:
config:
scrape_configs:
# Add your prometheus scrape configuration here.
# Using kubernetes_sd_configs with namespaced resources (e.g. pod)
# ensures the namespace is set on your metrics.
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: $$1:$$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
processors:
batch:
# batch metrics before sending to reduce API usage
send_batch_max_size: 200
send_batch_size: 200
timeout: 5s
memory_limiter:
# drop metrics if memory usage gets too high
check_interval: 1s
limit_percentage: 65
spike_limit_percentage: 20
resourcedetection:
# detect cluster name and location
detectors: [gcp]
timeout: 10s
transform:
# "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
# metrics containing these labels will be rejected. Prefix them with exported_ to prevent this.
metric_statements:
- context: datapoint
statements:
- set(attributes["exported_location"], attributes["location"])
- delete_key(attributes, "location")
- set(attributes["exported_cluster"], attributes["cluster"])
- delete_key(attributes, "cluster")
- set(attributes["exported_namespace"], attributes["namespace"])
- delete_key(attributes, "namespace")
- set(attributes["exported_job"], attributes["job"])
- delete_key(attributes, "job")
- set(attributes["exported_instance"], attributes["instance"])
- delete_key(attributes, "instance")
- set(attributes["exported_project_id"], attributes["project_id"])
- delete_key(attributes, "project_id")
exporters:
googlemanagedprometheus:
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch, memory_limiter, transform, resourcedetection]
exporters: [googlemanagedprometheus]
The Google Managed Prometheus exporter maps metrics to the prometheus_target monitored resource. The logic for mapping to monitored resources is designed to be used with the prometheus receiver, but can be used with other receivers as well. To avoid collisions (i.e. "duplicate timeseries enountered" errors), you need to ensure the prometheus_target resource uniquely identifies the source of metrics. The exporter uses the following resource attributes to determine monitored resource:
- location: [
location
,cloud.availability_zone
,cloud.region
] - cluster: [
cluster
,k8s.cluster.name
] - namespace: [
namespace
,k8s.namespace.name
] - job: [
service.name
+service.namespace
] - instance: [
service.instance.id
]
In the configuration above, cloud.availability_zone
, cloud.region
, and
k8s.cluster.name
are detected using the resourcedetection
processor with
the gcp
detector. The prometheus receiver sets service.name
to the
configured job_name
, and service.instance.id
is set to the scrape target's
instance
. The prometheus receiver sets k8s.namespace.name
when using
role: pod
.
In GMP, the above attributes are used to identify the prometheus_target
monitored resource. As such, it is recommended to avoid writing metric or resource labels
that match these keys. Doing so can cause errors when exporting metrics to
GMP or when trying to query from GMP. So, the recommended way to set them
is with the resourcedetection processor.
If you still need to set location
, cluster
, or namespace
labels
(such as when running in non-GCP environments), you can do so with the
resource processor like so:
processors:
resource:
attributes:
- key: "location"
value: "us-east1"
action: upsert
This example copies the location
metric attribute to a new exported_location
attribute, then deletes the original location
. It is recommended to use the exported_*
prefix, which is consistent with GMP's behavior.
You can also use the groupbyattrs processor to move metric labels to resource labels. This is useful in situations where, for example, an exporter monitors multiple namespaces (with each namespace exported as a metric label). One such example is kube-state-metrics.
Using groupbyattrs
will promote that label to a resource label and
associate those metrics with the new resource. For example:
processors:
groupbyattrs:
keys:
- namespace
- cluster
- location
exporter.googlemanagedprometheus.intToDouble
:Default=false
Change all metric datapoint type to double to preventValue type for metric <metric name> conflicts with the existing value type
errors:
"--feature-gates=exporter.googlemanagedprometheus.intToDouble"
Error: Value type for metric <metric name> conflicts with the existing value type
Google Managed Service for Promethueus (and Google Cloud Monitoring) have fixed
value types (INT and DOUBLE) for metrics. Once a metric has been written as an
INT or DOUBLE, attempting to write the other type will fail with the error
above. This commonly occurs when a metric's value type has changed, or when a
mix of INT and DOUBLE for the same metric are being written to the same
project. The recommended way to fix this is to convert all metrics to DOUBLE to
prevent collisions using the exporter.googlemanagedpromethues.intToDouble
feature gate, documented above.
Once you enable the feature gate, you will likely see new errors indicating type collisions, as some existing metrics will be changed from int to double. To fix this, you need to delete the metric descriptor. This will delete all existing data for the metric, but will allow it to be written as a double going forward. The simplest way to do this is by using the "Try this method" tab in the API reference for DeleteMetricDescriptor.
Error: One or more points were written more frequently than the maximum sampling period configured for the metric.
Google Managed Service for Promethueus (and Google Cloud Monitoring) limit the rate at which points can be written to one point every 5 seconds. If you try to write points more frequently, you will encounter the error above. If you know that you aren't writing points more frequently than 5 seconds, this can be a symptom of the Timeseries Collision problem below.
Error: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.
Error: Points must be written in order. One or more of the points specified had an older start time than the most recent point.
The errors above, and sometimes the
points were written more frequently than the maximum sampling period
error
can indicate that two metric datapoints are being written without any resource
or metric attributes that distinguish them from each other. We refer to this as
a "Timeseries Collision".
Duplicate TimeSeries encountered
is the clearest indication of a timeseries
collision. It means that two timeseries in a single request had identical
monitored resource and metric labels.
Points must be written in order
often indicates that two different collectors
are writing the same timeseries, since they can race to deliver the same
metric with slightly different timestamps. If the later timestamp is delivered
first, it triggers this error. The duplicates don't appear in the same request,
so it doesn't trigger the Duplicate TimeSeries encountered
error, but they do
still collide.
points were written more frequently than the maximum sampling period
also
often indicates that two different collectors are writing the same timeseries,
but happens when the first timestamp is delivered first, and the later
timestamp is delivered second. In this case, the points are in order, but are
rejected because they are too close together.
There are three main root causes for timeseries collisions:
- Resource attributes don't distinguish applications.
- Resource attributes are dropped by the exporter.
- Metric data point attributes don't distinguish timeseries (very rare).
The most common reason is (1), which means that it can be fixed by adding
resource information. If you are running on GCP, you can use the
resourcedetection processor
with the gcp
detector. If you are running on
Kubernetes (including GKE), we recommend also using the k8sattributes
processor to at least add k8s.namespace.name
and k8s.pod.name
. Finally,
it is important to make sure service.name
and service.instance.id
are set
by applications in a way that uniquely identifies each instance.
The next most common reason is (2), which means that the exporter's mapping
logic from OpenTelemetry resource to Google Cloud's prometheus_target
monitored resouce didn't preserve a resource attribute that was needed to
distinguish timeseries. This can be mitigated by adding resource
attributes as metric labels using resource_filters
configuration in the
exporter:
googlemanagedprometheus:
metric:
resource_filters:
regex: ".*"
If you need to troubleshoot errors further, start by filtering down to a single
metric from the error message using the filter
or transform
processors, and
using the debug
exporter with detailed
verbosity:
processors:
filter:
error_mode: ignore
metrics:
- name != "problematic.metric.name"
exporters:
debug:
verbosity: detailed
That can help identify which metric sources are colliding, so you know which applications or metrics need additional attributes to ditinguish them from one-another.