kubequery is a Osquery extension that provides SQL based analytics for Kubernetes clusters
kubequery will be packaged as docker image available from dockerhub. It is expected to be deployed as a Kubernetes Deployment per cluster. A sample deployment template is available here
kubequery tables schema is available here
Go 1.17
and make
are required to build kubequery. Run: make
Container image for master branch will be available on dockerhub
docker pull uptycs/kubequery:latest
For production, tagged container images should be used instead of latest
.
kubequery-template.yaml is a template that creates the following Kubernetes resources:
kubequery
Namespace will be the placeholder for all resources that are namespaced.
kubequery-sa
is ServiceAccount that is associated with the kubequery deployment pod specification. The container uses the service account token to authenticate with the API server.
kubequery-clusterrole
is a ClusterRole that allows get
and list
operations on all resources in the following API groups:
- "" (core)
- admissionregistration.k8s.io
- apps
- autoscaling
- batch
- networking.k8s.io
- policy
- rbac.authorization.k8s.io
- storage.k8s.io
kubequery-clusterrolebinding
is a ClusterRoleBinding that binds the cluster role with the service account.
kubequery-config
is a ConfigMap that will be mounted inside the container image as a directory. The contents of this config map should be similar to /etc/osquery
. For example, kubequery.flags, kubequery.conf, etc. should be part of this config map.
kubequery
is the Deployment that creates one replica pod. The container launched as a part of the pod is run as non-root user.
By default pod resource requests
and limits
are set to 500m (half a core) and 200MB. kubequery.yaml file should be tweaked to suite your needs before applying:
kubectl apply -f kubequery.yaml
Check the status of the pod using the following command. Pod should be in Running Status.
kubectl get pods -n kubequery
Validate the installation was successful by first executing:
kubectl exec -it $(kubectl get pods -n kubequery -o jsonpath='{.items[0].metadata.name}') -n kubequery -- kubequeryi '.tables'
Which should produce the following output:
=> kubernetes_api_resources
=> kubernetes_cluster_role_binding_subjects
=> kubernetes_cluster_role_policy_rule
=> kubernetes_config_maps
=> kubernetes_cron_jobs
=> kubernetes_csi_drivers
=> kubernetes_csi_node_drivers
=> kubernetes_daemon_set_containers
...
Queries can be run using kubequeryi on the deployed container:
kubectl exec -it $(kubectl get pods -n kubequery -o jsonpath='{.items[0].metadata.name}') -n kubequery -- kubequeryi --line 'SELECT * FROM kubernetes_pods'
Pod logs can be viewed using:
kubectl logs $(kubectl get pods -n kubequery -o jsonpath='{.items[0].metadata.name}') -n kubequery
Helm must be installed to use the charts. Please refer to Helm's documentation to get started.
Once Helm has been set up correctly, add the repo as follows:
helm repo add uptycs https://uptycs.github.io/kubequery
If you had already added this repo earlier, run helm repo update
to retrieve the latest versions of the packages. You can then run helm search repo uptycs
to see the charts.
To install the kubequery chart:
helm install my-kubequery uptycs/kubequery
To uninstall the chart:
helm delete my-kubequery
No. kubequery should to be deployed as a Kubernetes Deployment. Which means there will be one Pod of kubequery running per Kubernetes cluster. Osquery should be deployed to every node in the cluster. Querying most Osquery tables from an ephemeral pod does not provide much value. kubequery container image also runs as non-root user, which means most of the Osquery tables will either return an error or partial data.
Normalizing nested JSON data like Kubernetes API responses will create an explosion of tables. So some of the columns in kuberenetes tables are left as JSON. Data is eventually processed by SQLite with-in Osquery. SQLite has very good JSON support.
For example if run_as_user
in kubernetes_pod_security_policies
table looks like the following:
{"rule": "MustRunAsNonRoot"}
To get the value of rule
, the following query can be used:
SELECT value AS 'rule'
FROM kubernetes_pod_security_policies, json_tree(kubernetes_pod_security_policies.run_as_user)
WHERE key = 'rule';
+------------------+
| rule |
+------------------+
| MustRunAsNonRoot |
+------------------+
json_each can be used to explode JSON array types. For example if volumes
in kubernetes_pod_security_policies
table looks like the following:
{"volumes": ["configMap","emptyDir","projected","secret","downwardAPI","persistentVolumeClaim"]}
To get a separate row for each volume, the following query can be used:
SELECT value
FROM kubernetes_pod_security_policies, json_each(kubernetes_pod_security_policies.volumes);
+-----------------------+
| value |
+-----------------------+
| configMap |
| emptyDir |
| projected |
| secret |
| downwardAPI |
| persistentVolumeClaim |
+-----------------------+
Osquery logger's like TLS, Kafka loggers can be used to export scheduled query data to remove fleet management/security analytics platforms. Lamba like functions can be applied on rows of streaming data in these platforms. These lamba functions can extract necessary fields from embedded JSON to detect compliance issues or security concerns. If tables are normalized and are streamed at different schedules, it will not be trivial to JOIN across tables and trigger events/alerts.