Skip to content

Commit

Permalink
Prometheus based metrics and monitoring for KFServing models (kubeflo…
Browse files Browse the repository at this point in the history
…w#1276)

* initial readme

* readme blurb

* readme blurb

* anchors

* renaming

* renaming

* install prometheus operator

* install prometheus operator

* install prometheus operator

* kfserving prefixes

* prom services

* readme

* readme

* readme

* prom operator samples

* v1 cluster role and binding

* images

* readme

* Access prometheus metrics

* Access prometheus metrics

* minimal prometheus setup

* fixed prom queries

* fixed prom queries

* fixed typos

* restored kfserving jupyter notebook

* instructions for kustomizing install namespace
  • Loading branch information
sriumcp authored Jan 8, 2021
1 parent da3e7be commit c884755
Show file tree
Hide file tree
Showing 11 changed files with 160 additions and 0 deletions.
78 changes: 78 additions & 0 deletions docs/samples/metrics-and-monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Metrics and Monitoring

> Getting started with Prometheus-based monitoring of KFServing models.
# Table of Contents
1. [Install Prometheus](#install-prometheus)
2. [Access Prometheus Metrics](#access-prometheus-metrics)
3. [Metrics-driven experiments and progressive delivery](#metrics-driven-experiments-and-progressive-delivery)
4. [Removal](#removal)

## Install Prometheus

**Prerequisites:** Kubernetes cluster and [Kustomize v3](https://kubectl.docs.kubernetes.io/installation/kustomize/).

Install Prometheus using Prometheus Operator.

```shell
cd kfserving
kustomize build docs/samples/metrics-and-monitoring/prometheus-operator | kubectl apply -f -
kubectl wait --for condition=established --timeout=120s crd/prometheuses.monitoring.coreos.com
kubectl wait --for condition=established --timeout=120s crd/servicemonitors.monitoring.coreos.com
kustomize build docs/samples/metrics-and-monitoring/prometheus | kubectl apply -f -
```

> Note: The above steps install Kubernetes resource objects in the `kfserving-monitoring` namespace. This is Kustomizable. To install under a different namespace, say `my-monitoring`, change `kfserving-monitoring` to `my-monitoring` in the following three files: a) `prometheus-operator/namespace.yaml`, b) `prometheus-operator/kustomization.yaml`, and c) `prometheus/kustomization.yaml`.
## Access Prometheus Metrics
In this section, we will use a v1beta1 InferenceService sample to demonstrate how to access Prometheus metrics that are automatically generated by [Knative's queue-proxy container](https://knative.dev) for your KFServing models.

1. `kubectl create ns kfserving-test`
2. `cd docs/samples/v1beta1/sklearn`
3. `kubectl apply -f sklearn.yaml -n kfserving-test`
4. If you are using a Minikube based cluster, then in a separate terminal, run `minikube tunnel` and supply password if prompted.
5. In a separate terminal, follow [these instructions](https://github.com/kubeflow/kfserving/blob/master/README.md#determine-the-ingress-ip-and-ports) to find and set your ingress IP, host, and service hostname. Then, send prediction requests to the `sklearn-iris` model you created in Step 3. above as follows.
```
while clear; do \
curl -v \
-H "Host: ${SERVICE_HOSTNAME}" \
-d @./iris-input.json \
http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/sklearn-iris/infer
sleep 0.3
done
```
6. In a separate terminal, port forward the Prometheus service.
```shell
kubectl port-forward service/prometheus-operated -n kfserving-monitoring 9090:9090
```
7. Access Prometheus UI in your browser at http://localhost:9090
8. Access the number of prediction requests to the sklearn model, over the last 60 seconds. You can use the following query in the Prometheus UI:

```
sum(increase(revision_app_request_latencies_count{service_name=~"sklearn-iris-predictor-default"}[60s]))
```

You should see a response similar to the following.

![Request count](requestcount.png)

9. Access the mean latency for serving prediction requests for the same model as above, over the last 60 seconds. You can use the following query in the Prometheus UI:

```
sum(increase(revision_app_request_latencies_sum{service_name=~"sklearn-iris-predictor-default"}[60s]))/sum(increase(revision_app_request_latencies_count{service_name=~"sklearn-iris-predictor-default"}[60s]))
```

You should see a response similar to the following.

![Request count](requestlatency.png)

## Metrics-driven experiments and progressive delivery
See [iter8-kfserving](https://github.com/iter8-tools/iter8-kfserving).

## Removal
Remove Prometheus and Prometheus Operator as follows.
```shell
cd kfserving
kustomize build docs/samples/metrics-and-monitoring/prometheus | kubectl delete -f -
kustomize build docs/samples/metrics-and-monitoring/prometheus-operator | kubectl delete -f -
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
namePrefix: kfserving-
namespace: kfserving-monitoring
resources:
- github.com/prometheus-operator/prometheus-operator?ref=v0.44.1
- namespace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: kfserving-monitoring
24 changes: 24 additions & 0 deletions docs/samples/metrics-and-monitoring/prometheus/clusterrole.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kfserving-monitoring
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: monitoring
spec:
namespaceSelector:
any: true
selector:
matchLabels:
networking.internal.knative.dev/serviceType: Private
endpoints:
- port: http-usermetric
interval: 15s
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
namePrefix: kfserving-
namespace: kfserving-monitoring
resources:
- clusterrole.yaml
- clusterrolebinding.yaml
- prometheus.yaml
- serviceaccount.yaml
- kfserving-service-monitor.yaml
12 changes: 12 additions & 0 deletions docs/samples/metrics-and-monitoring/prometheus/prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: kfserving-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
resources:
requests:
memory: 400Mi
enableAdminAPI: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c884755

Please sign in to comment.