Skip to content

Commit

Permalink
Merge pull request #448 from redhatrises/gke_doc
Browse files Browse the repository at this point in the history
feat: add gke autopilot docs
  • Loading branch information
redhatrises authored Oct 26, 2023
2 parents 1fa50b6 + d80629d commit b1c6b77
Show file tree
Hide file tree
Showing 9 changed files with 206 additions and 13 deletions.
3 changes: 0 additions & 3 deletions docs/deployment/azure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,3 @@ Delete the Falcon Operator deployment by running:
```sh
kubectl delete -f https://github.com/crowdstrike/falcon-operator/releases/latest/download/falcon-operator.yaml
```



1 change: 0 additions & 1 deletion docs/deployment/eks-fargate/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,4 +172,3 @@ Using `aws`, `eksctl`, and `kubectl` command-line tools, perform the following s
```sh
kubectl create -f ./my-falcon-container.yaml
```
3 changes: 0 additions & 3 deletions docs/deployment/eks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,3 @@ Delete the Falcon Operator deployment by running:
```sh
kubectl delete -f https://github.com/crowdstrike/falcon-operator/releases/latest/download/falcon-operator.yaml
```



3 changes: 0 additions & 3 deletions docs/deployment/generic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,3 @@ Delete the Falcon Operator deployment by running:
```sh
kubectl delete -f https://github.com/crowdstrike/falcon-operator/releases/latest/download/falcon-operator.yaml
```



97 changes: 97 additions & 0 deletions docs/deployment/gke/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,104 @@ Delete the Falcon Operator deployment by running:
kubectl delete -f https://github.com/crowdstrike/falcon-operator/releases/latest/download/falcon-operator.yaml
```

## GKE Autopilot configuration

### Setting the PriorityClass

When you enable GKE Autopilot deployment in the Falcon Operator, this creates a new PriorityClass to ensure that the sensor DaemonSet has priority over other application pods. This means that it’s possible that some application pods are evicted or pushed back in the scheduling queue depending on cluster resources availability to accommodate sensor Pods. You can either allow the operator to deploy its own PriorityClass or specify an existing PriorityClass.

### Configuring the resource usage

GKE Autopilot enforces supported minimum and maximum values for the total resources requested by your deployment configuration and will adjust the limits and requests to be within the min/max range. GKE Autopilot lets you set requests but not limits, and will mutate the limits to match the request values.

```yaml
resources:
requests:
cpu: "250m"
limits:
cpu: "<mutates to match requests>"
```

To understand how GKE Autopilot adjusts limits, and the minimum and maximum resource requests, see [Google Cloud documentation: Minimum and maximum resource requests](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#min-max-requests).

The Falcon sensor resource usage depends on application workloads and therefore requires more resources if the sensor observes more events. The sensor defaults defined for memory and CPU are only for a successful sensor deployment. Consider adjusting the sensor memory and CPU within the allowed range enforced by GKE Autopilot to ensure the sensor deploys correctly.

> [!WARNING]
> If you set the requests or limits too low, you can potentially cause the sensor deployment to fail or cause loss of clouded events.

If the sensor fails to start, it’s likely because the application workload requires more resources. If this is the case, set the resource requests to a value higher within the acceptable GKE Autopilot min/max range.
If you notice that the resource allocation is too high for the application workloads, lower the resource requests as needed.

You can retrieve a snapshot of your resource usage with `kubectl top`, or other resource monitoring like Datadog or Prometheus. For example, the following command will show your CPU and Memory resource usage in the `falcon-system` namespace.

```shell
kubectl top pod -n falcon-system
NAME CPU(cores) MEMORY(bytes)
falcon-helm-falcon-node-sensor-slsmg 498m 223Mi
```

The sensor resource limits are only enabled when `backend: bpf`, which is a requirement for GKE Autopilot.

### Enabling GKE Autopilot

To enable GKE Autopilot and deploy the sensor running in user mode, configure the following settings:

1. Set the backend to run in user mode.
```yaml
node:
backend: bpf
```

2. Enable GKE Autopilot.
```yaml
node:
gke:
autopilot: true
```

3. Optionally, provide a name for an existing priority class, or the operator will create one for you.
```yaml
node:
priorityClass:
Name: my_custom_priorityclass
```

4. Based on your workload requirements, set the requests and limits. The default values for GKE Autopilot are `750m` CPU and `1.5Gi` memory. The minimum allowed values are `250m` CPU and `500Mi` memory:
```yaml
node:
resources:
cpu: 750m
memory: 1.5Gi
```
> [!WARNING]
> If you set the requests or limits too low, you can potentially cause the sensor deployment to fail or cause loss of clouded events.

Add the following toleration to deploy correctly on autopilot:

```yaml
- effect: NoSchedule
key: kubernetes.io/arch
operator: Equal
value: amd64
```

Putting it altogether, an example completed node sensor configuration for GKE Autopilot could look like the following:

```yaml
node:
backend: bpf
gke:
autopilot: true
resources:
requests:
cpu: 750m
memory: 1.5Gi
tolerations:
- effect: NoSchedule
operator: Equal
key: kubernetes.io/arch
value: amd64
```

## GKE Node Upgrades

Expand Down
5 changes: 3 additions & 2 deletions docs/src/deployment/README.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -98,5 +98,6 @@ Delete the Falcon Operator deployment by running:
{{ .KubeCmd }} delete -f https://github.com/crowdstrike/falcon-operator/releases/latest/download/falcon-operator.yaml
```

{{ template "eksiam.tmpl" . }}
{{ template "gkenode.tmpl" . }}
{{- template "eksiam.tmpl" . }}
{{- template "gkeautopilot.tmpl" . }}
{{- template "gkenode.tmpl" . }}
3 changes: 2 additions & 1 deletion docs/src/templates/eksiam.tmpl
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{{ if eq .Distro "eks-fargate" -}}
{{ if eq .Distro "eks-fargate" }}

## Configuring IAM Role to allow ECR Access on EKS Fargate

When the Falcon Container Injector is installed on EKS Fargate, the following error message may appear in the injector logs:
Expand Down
102 changes: 102 additions & 0 deletions docs/src/templates/gkeautopilot.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
{{ if eq .Distro "gke" }}

## GKE Autopilot configuration

### Setting the PriorityClass

When you enable GKE Autopilot deployment in the Falcon Operator, this creates a new PriorityClass to ensure that the sensor DaemonSet has priority over other application pods. This means that it’s possible that some application pods are evicted or pushed back in the scheduling queue depending on cluster resources availability to accommodate sensor Pods. You can either allow the operator to deploy its own PriorityClass or specify an existing PriorityClass.

### Configuring the resource usage

GKE Autopilot enforces supported minimum and maximum values for the total resources requested by your deployment configuration and will adjust the limits and requests to be within the min/max range. GKE Autopilot lets you set requests but not limits, and will mutate the limits to match the request values.

```yaml
resources:
requests:
cpu: "250m"
limits:
cpu: "<mutates to match requests>"
```

To understand how GKE Autopilot adjusts limits, and the minimum and maximum resource requests, see [Google Cloud documentation: Minimum and maximum resource requests](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#min-max-requests).

The Falcon sensor resource usage depends on application workloads and therefore requires more resources if the sensor observes more events. The sensor defaults defined for memory and CPU are only for a successful sensor deployment. Consider adjusting the sensor memory and CPU within the allowed range enforced by GKE Autopilot to ensure the sensor deploys correctly.

> [!WARNING]
> If you set the requests or limits too low, you can potentially cause the sensor deployment to fail or cause loss of clouded events.

If the sensor fails to start, it’s likely because the application workload requires more resources. If this is the case, set the resource requests to a value higher within the acceptable GKE Autopilot min/max range.
If you notice that the resource allocation is too high for the application workloads, lower the resource requests as needed.

You can retrieve a snapshot of your resource usage with `kubectl top`, or other resource monitoring like Datadog or Prometheus. For example, the following command will show your CPU and Memory resource usage in the `falcon-system` namespace.

```shell
kubectl top pod -n falcon-system
NAME CPU(cores) MEMORY(bytes)
falcon-helm-falcon-node-sensor-slsmg 498m 223Mi
```

The sensor resource limits are only enabled when `backend: bpf`, which is a requirement for GKE Autopilot.

### Enabling GKE Autopilot

To enable GKE Autopilot and deploy the sensor running in user mode, configure the following settings:

1. Set the backend to run in user mode.
```yaml
node:
backend: bpf
```

2. Enable GKE Autopilot.
```yaml
node:
gke:
autopilot: true
```

3. Optionally, provide a name for an existing priority class, or the operator will create one for you.
```yaml
node:
priorityClass:
Name: my_custom_priorityclass
```

4. Based on your workload requirements, set the requests and limits. The default values for GKE Autopilot are `750m` CPU and `1.5Gi` memory. The minimum allowed values are `250m` CPU and `500Mi` memory:
```yaml
node:
resources:
cpu: 750m
memory: 1.5Gi
```
> [!WARNING]
> If you set the requests or limits too low, you can potentially cause the sensor deployment to fail or cause loss of clouded events.

Add the following toleration to deploy correctly on autopilot:

```yaml
- effect: NoSchedule
key: kubernetes.io/arch
operator: Equal
value: amd64
```

Putting it altogether, an example completed node sensor configuration for GKE Autopilot could look like the following:

```yaml
node:
backend: bpf
gke:
autopilot: true
resources:
requests:
cpu: 750m
memory: 1.5Gi
tolerations:
- effect: NoSchedule
operator: Equal
key: kubernetes.io/arch
value: amd64
```

{{- end -}}
2 changes: 2 additions & 0 deletions docs/src/templates/gkenode.tmpl
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{{ if eq .Distro "gke" }}

## GKE Node Upgrades

If the sidecar sensor has been deployed to your GKE cluster, you will want to explicitly disable CrowdStrike Falcon from monitoring using labels for the kube-public, kube-system, falcon-operator, and falcon-system namespaces.
Expand Down Expand Up @@ -82,4 +83,5 @@ Using both `gcloud` and `{{ .KubeCmd }}` command-line tools, perform the followi
```sh
{{ .KubeCmd }} create -f ./my-falcon-container.yaml
```

{{- end -}}

0 comments on commit b1c6b77

Please sign in to comment.