Metrics are still received after Grafana Agent stops #643

acsgn · 2024-10-18T12:31:24Z

Bug Description

Hi,
While doing some testing for a customer, I realized that the metrics from machine deployments continue to be received after the Grafana Agent is already stopped or the machine itself is down. This only stops after 4 minutes which in our use, renders the alerts not so useful.

To Reproduce

multipass launch --cpus 4 --memory 8G --disk 30G --name cos-test 22.04
multipass shell cos-test

HOST_IP=$(hostname -I | cut -d ' ' -f 1)

lxd init --auto
lxc network set lxdbr0 ipv6.address none

sudo snap install microk8s --channel 1.30-strict
sudo microk8s enable hostpath-storage
sudo microk8s enable metallb:$HOST_IP-$HOST_IP

sudo snap install juju
mkdir -p ~/.local/share
juju bootstrap localhost overlord
sudo microk8s config | juju add-k8s k8s --controller overlord

juju add-model zookeeper localhost
juju add-model cos k8s

juju deploy -m zookeeper zookeeper
juju deploy -m zookeeper grafana-agent
juju relate -m zookeeper zookeeper grafana-agent

juju deploy -m cos cos-lite --trust
juju offer cos.grafana:grafana-dashboard
juju offer cos.loki:logging
juju offer cos.prometheus:receive-remote-write

juju consume -m zookeeper cos.prometheus
juju consume -m zookeeper cos.loki
juju consume -m zookeeper cos.grafana
juju relate -m zookeeper grafana-agent grafana
juju relate -m zookeeper grafana-agent loki
juju relate -m zookeeper grafana-agent prometheus

juju run -m cos grafana/leader get-admin-password
## Collect metrics for a while and browse Grafana before proceeding

juju ssh -m zookeeper grafana-agent/leader
date && sudo snap stop grafana-agent
exit

## Go back to Grafana and observe that the metrics are still "received" for 4 minutes after the service stops
## You can also try stoppping the LXC container, both causes to the same ghost metrics
## You can use the Explore tab and following metric with Prometheus as the data source
## zookeeper_QuorumSize OR up{juju_application="zookeeper"}

exit
multipass stop cos-test
multipass delete --purge cos-test

Environment

The reproduce steps are using multipass, I encountered the same behavior on local machine, GCP and AWS. All snaps and charms use the latest/stable.

Relevant log output

No logs are available

Additional context

No response

The text was updated successfully, but these errors were encountered:

acsgn added Status: Triage Type: Bug labels Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics are still received after Grafana Agent stops #643

Metrics are still received after Grafana Agent stops #643

acsgn commented Oct 18, 2024

Metrics are still received after Grafana Agent stops #643

Metrics are still received after Grafana Agent stops #643

Comments

acsgn commented Oct 18, 2024

Bug Description

To Reproduce

Environment

Relevant log output

Additional context