Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: expose operator metrics #928

Merged
merged 39 commits into from
Aug 1, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
17f065f
upgrade golangci-lint
jaideepr97 Apr 18, 2023
a72c846
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Apr 19, 2023
b63520b
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 1, 2023
57b99fa
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 4, 2023
bbd373b
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 4, 2023
ef1549b
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 9, 2023
3dc6561
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 10, 2023
16f85f6
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 12, 2023
238fa7c
wip: add new metrics
jaideepr97 May 25, 2023
f43dcb5
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 25, 2023
caf2b9b
add new metrics server
jaideepr97 May 26, 2023
3a79f5f
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 May 26, 2023
94f4a78
update service/servicemonitor
jaideepr97 May 30, 2023
e5f5637
Merge branch 'master' of github.com:argoproj-labs/argocd-operator int…
jaideepr97 May 30, 2023
85c170b
inject all metrics into default registry, remove extra metrics server…
jaideepr97 Jun 7, 2023
c858fba
Merge branch 'master' of github.com:argoproj-labs/argocd-operator int…
jaideepr97 Jun 7, 2023
98cfb73
revert change to service in 0.6.0 bundle folder
jaideepr97 Jun 7, 2023
f7bed3f
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jun 7, 2023
9306ef2
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jun 8, 2023
4b28b5f
address lint issue
jaideepr97 Jun 9, 2023
0d4d2db
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jun 15, 2023
9382d65
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jun 29, 2023
d29ce02
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jul 3, 2023
a884695
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jul 6, 2023
390d426
fix merge conflicts
jaideepr97 Jul 7, 2023
455a82c
Merge branch 'master' of github.com:argoproj-labs/argocd-operator int…
jaideepr97 Jul 20, 2023
93516ee
manifest fixes
jaideepr97 Jul 20, 2023
c9e95d1
remove servicemonitor from bundle
jaideepr97 Jul 20, 2023
31335e5
Merge branch 'master' of github.com:argoproj-labs/argocd-operator
jaideepr97 Jul 20, 2023
b94d1f8
fix manifest issues
jaideepr97 Jul 20, 2023
21d1573
remove kube-rbac-proxy container
jaideepr97 Jul 25, 2023
b9cba3d
resolve merge conflicts
jaideepr97 Jul 25, 2023
e7b16ed
add total reconciliations counter, change bucket sizes for histogram
jaideepr97 Jul 26, 2023
bbb5bde
add kuttl test and docs
jaideepr97 Jul 26, 2023
00d4fd9
undo manifest changes
jaideepr97 Jul 26, 2023
9dec2ca
geenrate manifets with old operator-sdk version
jaideepr97 Jul 26, 2023
98b1036
change ci workflow to run opreator on cluster instead of locally
jaideepr97 Jul 27, 2023
30292f4
revert changes to github actions workflows; remove operator-metrics e…
jaideepr97 Jul 27, 2023
3f3b277
remove extra metrics server and move reconciliation count outside of …
jaideepr97 Aug 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
inject all metrics into default registry, remove extra metrics server…
… exposed at 8085

Signed-off-by: Jaideep Rao <jaideep.r97@gmail.com>
  • Loading branch information
jaideepr97 committed Jun 7, 2023
commit 85c170b2fe960b135d87f6253f27afd9d66c03e0
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: argocd-operator
name: argocd-operator-controller-manager-metrics-monitor
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
path: /metrics
port: https
scheme: https
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: argocd-operator
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,9 @@ metadata:
name: argocd-operator-controller-manager-metrics-service
spec:
ports:
- name: default-metrics
port: 8080
targetPort: 8080
- name: custom-metrics
port: 8085
targetPort: 8085
- name: https
port: 8443
targetPort: https
selector:
control-plane: argocd-operator
status:
Expand Down
9 changes: 6 additions & 3 deletions bundle/manifests/argocd-operator.clusterserviceversion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -549,8 +549,8 @@ spec:
- urn:alm:descriptor:com.tectonic.ui:fieldGroup:RBAC
- urn:alm:descriptor:com.tectonic.ui:text
- description: 'Policy is CSV containing user-defined RBAC policies and role
definitions. Policy rules are in the form: p, subject, resource, action,
object, effect Role definitions and bindings are in the form: g, subject,
definitions. Policy rules are in the form: p, subject, resource, action,
object, effect Role definitions and bindings are in the form: g, subject,
inherited-subject See https://github.com/argoproj/argo-cd/blob/master/docs/operator-manual/rbac.md
for additional information.'
displayName: Policy
Expand Down Expand Up @@ -1004,6 +1004,7 @@ spec:
- monitoring.coreos.com
resources:
- prometheuses
- prometheusrules
- servicemonitors
verbs:
- '*'
Expand Down Expand Up @@ -1067,7 +1068,9 @@ spec:
- create
serviceAccountName: argocd-operator-controller-manager
deployments:
- name: argocd-operator-controller-manager
- label:
control-plane: argocd-operator
name: argocd-operator-controller-manager
spec:
replicas: 1
selector:
Expand Down
7 changes: 2 additions & 5 deletions common/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -299,11 +299,8 @@ vs-ssh.visualstudio.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7Hr1oTWqNqOlzGJOf
github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
`
// OperatorDefaultMetricsPort is the port that is used to expose default controller-runtime metrics for the operator pod.
OperatorDefaultMetricsPort = 8080

// OperatorDefaultMetricsPort is the port that is used to expose custom metrics implemented by the Argo CD Operator.
OperatorCustomMetricsPort = 8085
// OperatorMetricsPort is the port that is used to expose default controller-runtime metrics for the operator pod.
OperatorMetricsPort = 8080
)

// DefaultLabels returns the default set of labels for controllers.
Expand Down
2 changes: 1 addition & 1 deletion config/default/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ bases:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER'. 'WEBHOOK' components are required.
#- ../certmanager
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
- ../prometheus

patchesStrategicMerge:
# Protect the /metrics endpoint by putting it behind auth.
Expand Down
10 changes: 1 addition & 9 deletions config/prometheus/monitor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,7 @@ metadata:
spec:
endpoints:
- path: /metrics
interval: 30s
port: default-metrics
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
- path: /metrics
interval: 10s
port: cr-metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
Expand Down
9 changes: 3 additions & 6 deletions config/rbac/auth_proxy_service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,8 @@ metadata:
namespace: system
spec:
ports:
- name: default-metrics
port: 8080
targetPort: 8080
- name: custom-metrics
port: 8085
targetPort: 8085
- name: https
port: 8443
targetPort: https
selector:
control-plane: argocd-operator
1 change: 1 addition & 0 deletions config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ rules:
- monitoring.coreos.com
resources:
- prometheuses
- prometheusrules
- servicemonitors
verbs:
- '*'
Expand Down
2 changes: 1 addition & 1 deletion controllers/argocd/argocd_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ var ActiveInstanceMap = make(map[string]string)
//+kubebuilder:rbac:groups=batch,resources=cronjobs;jobs,verbs=*
//+kubebuilder:rbac:groups=config.openshift.io,resources=clusterversions,verbs=get;list;watch
//+kubebuilder:rbac:groups=networking.k8s.io,resources=ingresses,verbs=*
//+kubebuilder:rbac:groups=monitoring.coreos.com,resources=prometheuses;servicemonitors,verbs=*
//+kubebuilder:rbac:groups=monitoring.coreos.com,resources=prometheuses;servicemonitors;prometheusrules,verbs=*
//+kubebuilder:rbac:groups=route.openshift.io,resources=routes;routes/custom-host,verbs=*
//+kubebuilder:rbac:groups=argoproj.io,resources=applications;appprojects,verbs=*
//+kubebuilder:rbac:groups=rbac.authorization.k8s.io,resources=*,verbs=*
Expand Down
14 changes: 9 additions & 5 deletions controllers/argocd/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,20 @@ import (
"net/http"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
ActiveInstancesByPhase = promauto.NewGaugeVec(
ActiveInstancesByPhase = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "active_argocd_instances_by_phase",
Help: "Number of active argocd instances by phase",
},
[]string{"phase"},
)

ActiveInstancesTotal = promauto.NewGauge(
ActiveInstancesTotal = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "active_argocd_instances_total",
Help: "Total number of active argocd instances",
Expand All @@ -27,8 +27,8 @@ var (

// ReconcileTime is a prometheus metric which keeps track of the duration
// of reconciliations for a given instance
ReconcileTime = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: "controller_runtime_reconcile_time_seconds",
ReconcileTime = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "controller_runtime_reconcile_time_seconds_per_instance",
Help: "Length of time per reconciliation per instance",
Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60},
Expand All @@ -45,3 +45,7 @@ func StartMetricsServer(port int) chan error {
}()
return errCh
}

func init() {
metrics.Registry.MustRegister(ActiveInstancesTotal, ActiveInstancesByPhase, ReconcileTime)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: argocd-operator
name: argocd-operator-controller-manager-metrics-monitor
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
path: /metrics
port: https
scheme: https
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: argocd-operator
Original file line number Diff line number Diff line change
Expand Up @@ -549,8 +549,8 @@ spec:
- urn:alm:descriptor:com.tectonic.ui:fieldGroup:RBAC
- urn:alm:descriptor:com.tectonic.ui:text
- description: 'Policy is CSV containing user-defined RBAC policies and role
definitions. Policy rules are in the form: p, subject, resource, action,
object, effect Role definitions and bindings are in the form: g, subject,
definitions. Policy rules are in the form: p, subject, resource, action,
object, effect Role definitions and bindings are in the form: g, subject,
inherited-subject See https://github.com/argoproj/argo-cd/blob/master/docs/operator-manual/rbac.md
for additional information.'
displayName: Policy
Expand Down Expand Up @@ -1004,6 +1004,7 @@ spec:
- monitoring.coreos.com
resources:
- prometheuses
- prometheusrules
- servicemonitors
verbs:
- '*'
Expand Down Expand Up @@ -1067,7 +1068,9 @@ spec:
- create
serviceAccountName: argocd-operator-controller-manager
deployments:
- name: argocd-operator-controller-manager
- label:
control-plane: argocd-operator
name: argocd-operator-controller-manager
spec:
replicas: 1
selector:
Expand Down
12 changes: 1 addition & 11 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ func main() {
var metricsAddr string
var enableLeaderElection bool
var probeAddr string
flag.StringVar(&metricsAddr, "metrics-bind-address", fmt.Sprintf(":%d", common.OperatorDefaultMetricsPort), "The address the metric endpoint binds to.")
flag.StringVar(&metricsAddr, "metrics-bind-address", fmt.Sprintf(":%d", common.OperatorMetricsPort), "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
Expand Down Expand Up @@ -202,16 +202,6 @@ func main() {
os.Exit(1)
}

// start a new metrics server at a different port to serve custom implemented metrics
// This step is being taken as a workaround since custom metrics were not getting registered
// within the same default registry as is being used out of the box
go func() {
msErrCh := argocd.StartMetricsServer(common.OperatorCustomMetricsPort)
if err = <-msErrCh; err != nil {
setupLog.Error(err, "metrics server exited with error: %v", err)
}
}()

setupLog.Info("starting manager")
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
Expand Down