Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the OSS deployment docs to center around Helm. #39537

Merged
merged 27 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3ccc89a
Rework the OSS deployment docs to center around Helm.
bgroff Jun 17, 2024
1978164
Merge branch 'master' into bgroff/deploy-documentation-refactor
Adorism Jul 4, 2024
f6ac87a
edits to deployment overview, sidebar nav, and a redirect
Adorism Jul 5, 2024
f836f8e
update wording on example usecases for Pyairbyte
Adorism Jul 5, 2024
75f2c86
Merge branch 'master' into bgroff/deploy-documentation-refactor
Adorism Jul 5, 2024
f8535a2
Merge branch 'master' into bgroff/deploy-documentation-refactor
Adorism Jul 5, 2024
a02a5a8
Merge branch 'master' into bgroff/deploy-documentation-refactor
Adorism Jul 9, 2024
5868d91
edits to deployment reference overview, database, and ingress
Adorism Jul 9, 2024
9f7cd70
Merge branch 'master' into bgroff/deploy-documentation-refactor
Adorism Jul 9, 2024
5476ddc
comment out some navigation to enable incremental additions of deploy…
Adorism Jul 9, 2024
84a5b9d
fix build errors and update wording in secrets guide
Adorism Jul 9, 2024
7822ffb
Merge branch 'master' into bgroff/deploy-documentation-refactor
Adorism Jul 9, 2024
eaec7c2
Update the documentation to address some of the feedback given.
bgroff Jul 10, 2024
5cc18d0
Fix link
bgroff Jul 10, 2024
f4e74ed
Take two on link fixing.
bgroff Jul 10, 2024
4314147
Removed all references to Enterprise. Added a note in the Ingress sec…
bgroff Jul 12, 2024
492a7ad
Merge branch 'master' into bgroff/deploy-documentation-refactor
bgroff Jul 12, 2024
76ddf22
Revert this file.
bgroff Jul 12, 2024
2a61a1f
Merge branch 'bgroff/deploy-documentation-refactor' of github.com:air…
bgroff Jul 12, 2024
e285f0c
Add section on adding the help repo.
bgroff Jul 12, 2024
85b9d92
Add section for AWS.
bgroff Jul 12, 2024
96abe19
Fix typo.
bgroff Jul 12, 2024
75cff9a
wording
nataliekwong Jul 12, 2024
f999f62
wording integrations
nataliekwong Jul 12, 2024
13c4d17
typing
nataliekwong Jul 12, 2024
c160c6e
Merge branch 'master' into bgroff/deploy-documentation-refactor
bgroff Jul 12, 2024
4f22582
Merge branch 'master' into bgroff/deploy-documentation-refactor
bgroff Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Rework the OSS deployment docs to center around Helm.
  • Loading branch information
bgroff committed Jun 17, 2024
commit 3ccc89aff1e9753eadd141874050dfcab15ac102
171 changes: 171 additions & 0 deletions docs/deploying-airbyte/deploying-airbyte.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Deploying Airbyte

The Airbyte platform is a sophisticated data integration platform that is built to handle large
amounts of data movement. If you are looking for a more streamlined way to run Airbyte connectors
you can visit the [PyAibyte](#using-airbyte/pyairbyte/getting-started) documentation. If you are
looking to quickly deploy Airbyte on your local machine you can visit the
[Quickstart](#deploying-airbyte/quickstart) guide.

## Understanding the Airbyte Deployment

Airbyte is a platform that is built to be deployed in a cloud environment. The platform has been
built on top of Kubernetes. The recommended way of deploying Airbyte is to use Helm and the
documented Helm chart values. The Helm chart is available in the Airbyte repository here: #TODO

The [Ingrastructure](#deploying-airbyte/infrastructure) section describes the Airbyte's recommended
way to setup the needed Cloud Infrastructure for each supported platform. These guides will help you
setup the necessary infrastructure for deploying Airbyte, but you are not required to follow these
guides and Airbyte tries to be as flexible as possible to fit into your existing infrastructure.

## Integrations

The Airbyte platform has been built to integrate into your Cloud infrastructure. You can
configure various components of the platform to suit your needs. This includes an object store,
such as S3 or GCS for storing logs and state, a database for externalizing state, and a secret
manager for keep your secrets secure. Each of these integrations can be configured to suit your
needs. Their configuration is described in the [Integrations](#deploying-airbyte/integrations)
section. Each of these integrations has a longer description of why you would want to configure
the integration, as well as, how to configure the integration.

## Preconfiguring Kubernetes Secrets

We use a secret to pull values out of it should look like this:

While you can set the name of the secret to whatever you prefer, you will need to set that name in various places in your values.yaml file. For this reason we suggest that you keep the name of `airbyte-config-secrets` unless you have a reason to change it.


<details>
<summary>airbyte-config-secrets</summary>

<Tabs>
<TabItem value="S3" label="S3" default>

```yaml
apiVersion: v1
kind: Secret
metadata:
name: airbyte-config-secrets
type: Opaque
stringData:
# Enterprise License Key
license-key: ## e.g. xxxxx.yyyyy.zzzzz

# Database Secrets
database-host: ## e.g. database.internla
database-port: ## e.g. 5432
database-name: ## e.g. airbyte
database-user: ## e.g. airbyte
database-password: ## e.g. password

# Instance Admin
instance-admin-email: ## e.g. admin@company.example
instance-admin-password: ## e.g. password

# SSO OIDC Credentials
client-id: ## e.g. e83bbc57-1991-417f-8203-3affb47636cf
client-secret: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# AWS S3 Secrets
s3-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
s3-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# AWS Secret Manager
aws-secret-manager-access-key-id: ## e.g. AKIAIOSFODNN7EXAMPLE
aws-secret-manager-secret-access-key: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

```

You can also use `kubectl` to create the secret directly from the CLI:

```sh
kubectl create secret generic airbyte-config-secrets \
--from-literal=license-key='' \
--from-literal=database-host='' \
--from-literal=database-port='' \
--from-literal=database-name='' \
--from-literal=database-user='' \
--from-literal=database-password='' \
--from-literal=instance-admin-email='' \
--from-literal=instance-admin-password='' \
--from-literal=s3-access-key-id='' \
--from-literal=s3-secret-access-key='' \
--from-literal=aws-secret-manager-access-key-id='' \
--from-literal=aws-secret-manager-secret-access-key='' \
--namespace airbyte
```


</TabItem>
<TabItem value="GCS" label="GCS">

First, create a new file `gcp.json` containing the credentials JSON blob for the service account you are looking to assume.


```yaml
apiVersion: v1
kind: Secret
metadata:
name: airbyte-config-secrets
type: Opaque
stringData:
# Enterprise License Key
license-key: ## e.g. xxxxx.yyyyy.zzzzz

# Database Secrets
database-host: ## e.g. database.internla
database-port: ## e.g. 5432
database-name: ## e.g. airbyte
database-user: ## e.g. airbyte
database-password: ## e.g. password

# Instance Admin Credentials
instance-admin-email: ## e.g. admin@company.example
instance-admin-password: ## e.g. password

# SSO OIDC Credentials
client-id: ## e.g. e83bbc57-1991-417f-8203-3affb47636cf
client-secret: ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

# GCP Secrets
gcp.json: <CREDENTIALS_JSON_BLOB>
```

Using `kubectl` to create the secret directly from the `gcp.json` file:

```sh
kubectl create secret generic airbyte-config-secrets \
--from-literal=license-key='' \
--from-literal=database-host='' \
--from-literal=database-port='' \
--from-literal=database-name='' \
--from-literal=database-user='' \
--from-literal=database-password='' \
--from-literal=instance-admin-email='' \
--from-literal=instance-admin-password='' \
--from-file=gcp.json
--namespace airbyte
```

</TabItem>
</Tabs>
</details>


## Tools

### Required Tools

Helm

Kubectl

### Optional Tools

K9s

Stern

bgroff marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions docs/deploying-airbyte/infrastructure/aws.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Amazon Web Services (AWS)
bgroff marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions docs/deploying-airbyte/infrastructure/azure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Microsoft Azure
1 change: 1 addition & 0 deletions docs/deploying-airbyte/infrastructure/gcp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Google Cloud Platform (GCP)
44 changes: 44 additions & 0 deletions docs/deploying-airbyte/integrations/database.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# External Database

For Self-Managed Enterprise deployments, we recommend using a dedicated database instance for better reliability, and backups (such as AWS RDS or GCP Cloud SQL) instead of the default internal Postgres database (`airbyte/db`) that Airbyte spins up within the Kubernetes cluster.

We assume in the following that you've already configured a Postgres instance:

<details open>
<summary>External database setup steps</summary>

Add external database details to your `values.yaml` file. This disables the default internal Postgres database (`airbyte/db`), and configures the external Postgres database. You can override all of the values below by setting them in the airbyte-config-secrets or set them directly here. You must set the database password in the airbyte-config-secrets. Here is an example configuration:

```yaml
postgresql:
enabled: false

global:
database:
# -- Secret name where database credentials are stored
secretName: "" # e.g. "airbyte-config-secrets"

# -- The database host
host: ""
# -- The key within `secretName` where host is stored
#hostSecretKey: "" # e.g. "database-host"

# -- The database port
port: ""
# -- The key within `secretName` where port is stored
#portSecretKey: "" # e.g. "database-port"

# -- The database name
database: ""
# -- The key within `secretName` where the database name is stored
#databaseSecretKey: "" # e.g. "database-name"

# -- The database user
user: "" # -- The key within `secretName` where the user is stored
#userSecretKey: "" # e.g. "database-user"

# -- The key within `secretName` where password is stored
passwordSecretKey: "" # e.g."database-password"
```

</details>
111 changes: 111 additions & 0 deletions docs/deploying-airbyte/integrations/ingress.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@


import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Ingress

To access the Airbyte UI, you will need to manually attach an ingress configuration to your deployment. The following is a skimmed down definition of an ingress resource you could use for Self-Managed Enterprise:

<details open>
<summary>Ingress configuration setup steps</summary>
<Tabs>
<TabItem value="NGINX" label="NGINX">

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: # ingress name, example: enterprise-demo
annotations:
ingress.kubernetes.io/ssl-redirect: "false"
spec:
ingressClassName: nginx
rules:
- host: # host, example: enterprise-demo.airbyte.com
http:
paths:
- backend:
service:
# format is ${RELEASE_NAME}-airbyte-webapp-svc
name: airbyte-enterprise-airbyte-webapp-svc
port:
number: 80 # service port, example: 8080
path: /
pathType: Prefix
- backend:
service:
# format is ${RELEASE_NAME}-airbyte-keycloak-svc
name: airbyte-enterprise-airbyte-keycloak-svc
port:
number: 8180
path: /auth
pathType: Prefix
- backend:
service:
# format is ${RELEASE_NAME}-airbyte--server-svc
name: airbyte-enterprise-airbyte-server-svc
port:
number: 8001
path: /api/public
pathType: Prefix
```
</TabItem>
<TabItem value="Amazon ALB" label="Amazon ALB">

If you are intending on using Amazon Application Load Balancer (ALB) for ingress, this ingress definition will be close to what's needed to get up and running:


```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: # ingress name, e.g. enterprise-demo
annotations:
# Specifies that the Ingress should use an AWS ALB.
kubernetes.io/ingress.class: "alb"
# Redirects HTTP traffic to HTTPS.
ingress.kubernetes.io/ssl-redirect: "true"
# Creates an internal ALB, which is only accessible within your VPC or through a VPN.
alb.ingress.kubernetes.io/scheme: internal
# Specifies the ARN of the SSL certificate managed by AWS ACM, essential for HTTPS.
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-x:xxxxxxxxx:certificate/xxxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxxxxxx
# Sets the idle timeout value for the ALB.
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=30
# [If Applicable] Specifies the VPC subnets and security groups for the ALB
# alb.ingress.kubernetes.io/subnets: '' e.g. 'subnet-12345, subnet-67890'
# alb.ingress.kubernetes.io/security-groups: <SECURITY_GROUP>
spec:
rules:
- host: # e.g. enterprise-demo.airbyte.com
http:
paths:
- backend:
service:
name: airbyte-enterprise-airbyte-webapp-svc
port:
number: 80
path: /
pathType: Prefix
- backend:
service:
name: airbyte-enterprise-airbyte-keycloak-svc
port:
number: 8180
path: /auth
pathType: Prefix
- backend:
service:
# format is ${RELEASE_NAME}-airbyte-server-svc
name: airbyte-enterprise-airbyte-server-svc
port:
number: 8001
path: /api/public
pathType: Prefix
```

The ALB controller will use a `ServiceAccount` that requires the [following IAM policy](https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json) to be attached.

</TabItem>
</Tabs>
</details>
1 change: 1 addition & 0 deletions docs/deploying-airbyte/integrations/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Monitoring
bgroff marked this conversation as resolved.
Show resolved Hide resolved
55 changes: 55 additions & 0 deletions docs/deploying-airbyte/integrations/secrets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Secret Management


Airbyte's default behavior is to store encrypted connector secrets on your cluster as Kubernetes secrets. You may <b>optionally</b> opt to instead store connector secrets in an external secret manager such as AWS Secrets Manager, Google Secrets Manager or Hashicorp Vault. Upon creating a new connector, secrets (e.g. OAuth tokens, database passwords) will be written to, then read from the configured secrets manager.

<details open>
<summary>Configuring external connector secret management</summary>

Modifing the configuration of connector secret storage will cause all <i>existing</i> connectors to fail. You will need to recreate these connectors to ensure they are reading from the appropriate secret store.

<Tabs>
<TabItem label="Amazon" value="Amazon">

If authenticating with credentials, ensure you've already created a Kubernetes secret containing both your AWS Secrets Manager access key ID, and secret access key. By default, secrets are expected in the `airbyte-config-secrets` Kubernetes secret, under the `aws-secret-manager-access-key-id` and `aws-secret-manager-secret-access-key` keys. Steps to configure these are in the above [prerequisites](#configure-kubernetes-secrets).

```yaml
secretsManager:
type: awsSecretManager
awsSecretManager:
region: <aws-region>
authenticationType: credentials ## Use "credentials" or "instanceProfile"
tags: ## Optional - You may add tags to new secrets created by Airbyte.
- key: ## e.g. team
value: ## e.g. deployments
- key: business-unit
value: engineering
kms: ## Optional - ARN for KMS Decryption.
```

Set `authenticationType` to `instanceProfile` if the compute infrastructure running Airbyte has pre-existing permissions (e.g. IAM role) to read and write from AWS Secrets Manager.

To decrypt secrets in the secret manager with AWS KMS, configure the `kms` field, and ensure your Kubernetes cluster has pre-existing permissions to read and decrypt secrets.

</TabItem>
<TabItem label="GCP" value="GCP">

Ensure you've already created a Kubernetes secret containing the credentials blob for the service account to be assumed by the cluster. By default, secrets are expected in the `gcp-cred-secrets` Kubernetes secret, under a `gcp.json` file. Steps to configure these are in the above [prerequisites](#configure-kubernetes-secrets). For simplicity, we recommend provisioning a single service account with access to both GCS and GSM.

```yaml
secretsManager:
type: googleSecretManager
storageSecretName: gcp-cred-secrets
googleSecretManager:
projectId: <project-id>
credentialsSecretKey: gcp.json
```

</TabItem>
</Tabs>

</details>
Loading
Loading