v-agent is a daemon that gathers metrics and writes them to a compatible remote_write endpoint.
It acts like prometheus (scraping metrics), re-labeling metrics, then pushing them off to a remote_write compatible endpoint. Internally, we use mimir and write our metrics there but other implementations should work.
All metrics that are specifically created with v-agent are prefixed with v_. Scraped metrics are not modified other than the addition of labels.
Every metric will have all metrics in labels_config added to it. The following are special labels:
hostname: Pulled automatically. Set withHOSTNAMEenvironment variable oros.Hostname()subid: The subscription ID for the underlying service. Can be set inconfig.yaml. If it's not set an attempt is made to pull it from metadata API.vpsid: The VPS ID. Can be set inconfig.yaml. If it's not set an attempt is made to pull it from the metadata API.
The above labels are added to ensure that the metric is unique. A lack of uniqueness can result in metrics getting overwritten/clobbered.
System metrics that are collected:
- CPU utilization: system, user, steal, utilization, etc.
- Memory utilization: cached, buffered, utilization, etc.
- Load average: 1, 5, 15, and tasks
- Disk stats: writes/reads, etc.
- Filesystem stats: bytes, inodes, utilization
- NIC: bytes, packets, errors, etc.
Kubernetes:
v_kube_apiserver_healthythat is0(if healthy) or1if not healthy based on response from kube-apiserver/healthzendpoint.- Every metric from
/metrics
Etcd:
v_etcd_healthythat is0(if healthy) or1if not healthy based on response from etcd/healthendpoint.- Every metric from
/metrics
Konnectivty:
v_konnectivity_healthythat is0(if healthy) or1if not healthy based on response from konnectivity/healthzendpoint.- Every metric from
/metrics
HAProxy:
v_haproxy_healthythat is0(if healthy) or1if not healthy based on response from/metricsendpoint.- Every metric from
/metrics
Ceph:
v_ceph_healthy: Not implemented yet.- Every metric from
/metrics
Configuration is through config.yaml, sample:
debug: true # debug output
interval: 60 # interval to scrape metrics
endpoint: https://endpoint... # remote endpoint
basic_auth_user: "" # basic auth user
basic_auth_pass: "" # basic auth pass
check_vendor: false # when true, vendor must be "Vultr"; set to false otherwise
labels_config: # any labels below will be added to all metrics
hostname: "" # empty string uses local hostname, unset (nil) doesnt use, non-empty string uses specified label
subid: "" # empty string pulls from userdata, unset (nil) doesnt use, non-empty string uses specified label
vpsid: "" # empty string pulls from userdata, unset (nil) doesnt use, non-empty string uses specified label
product: vke # unset (nil) doesnt use, non-empty string uses specified label. Note: This label is used to determine subid for vke/vlb/vfs
any: any # any key/value label
probes_api:
listen: 0.0.0.0
port: 7091
metrics_config:
agent:
load_avg:
enabled: true
cpu:
enabled: true
memory:
enabled: true
nic:
enabled: true
disk_stats:
enabled: true
filter: "sr0" # regex
file_system:
enabled: true
kubernetes:
enabled: true
endpoint: https://localhost:6443
kubeconfig: /var/lib/kubernetes/admin.kubeconfig
konnectivity:
enabled: true
metrics_endpoint: http://localhost:8133 # /metrics
health_endpoint: http://localhost:8092 # /healthz
etcd:
enabled: true
cacert: /var/lib/kubernetes/ca.pem
cert: /var/lib/kubernetes/kubernetes.pem
key: /var/lib/kubernetes/kubernetes-key.pem
endpoint: https://10.1.96.3:2379 # /metrics
nginx_vts:
enabled: false
endpoint: http://localhost:9001 # /metrics
v_cdn_agent:
enabled: false
endpoint: http://localhost:9093 # /metrics
haproxy:
enabled: false
endpoint: http://localhost:8404 # /metrics
ceph:
enabled: false
endpoint: http://localhost:9283 # /metrics
v_dns:
enabled: false
endpoint: http://localhost:9053 # /metrics
smart:
enabled: false
block_devices: # must exist, if not set, block devices are used from /sys/block/ (except for dmX and loopX)
- /dev/sda
kubernetes:
pods: # v-agent must be running inside k8s for this to work
enabled: false
namespaces:
- rook-ceph
- default
dcgm: # v-agent must be running inside k8s for this to work
enabled: false
namespace: gpu-operator # namespace
endpoint: nvidia-dcgm-exporter # name of the endpoint: k get endpointsCurrently, it's not 100% compatible with Kubernetes, that is to say running inside k8s and able to scrape k8s metrics. Right now it's largely an agent used to scrape metrics for services.
Note: Agent must be built with cgo disabled, not doing so will result in GLIBC errors being thrown: CGO_ENABLED=0 go build -o v-agent cmd/v-agent/main.go