Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Upgrade Spark operator blueprint to EKS 1.30 and fluent-bt changes #558

Merged
merged 3 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions analytics/terraform/spark-k8s-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/
| <a name="module_eks_blueprints_addons"></a> [eks\_blueprints\_addons](#module\_eks\_blueprints\_addons) | aws-ia/eks-blueprints-addons/aws | ~> 1.2 |
| <a name="module_eks_data_addons"></a> [eks\_data\_addons](#module\_eks\_data\_addons) | aws-ia/eks-data-addons/aws | ~> 1.30 |
| <a name="module_s3_bucket"></a> [s3\_bucket](#module\_s3\_bucket) | terraform-aws-modules/s3-bucket/aws | ~> 3.0 |
| <a name="module_spark_team_a_irsa"></a> [spark\_team\_a\_irsa](#module\_spark\_team\_a\_irsa) | aws-ia/eks-blueprints-addon/aws | ~> 1.0 |
| <a name="module_spark_team_irsa"></a> [spark\_team\_irsa](#module\_spark\_team\_irsa) | aws-ia/eks-blueprints-addon/aws | ~> 1.0 |
| <a name="module_vpc"></a> [vpc](#module\_vpc) | terraform-aws-modules/vpc/aws | ~> 5.0 |
| <a name="module_vpc_endpoints"></a> [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | ~> 5.0 |
| <a name="module_vpc_endpoints_sg"></a> [vpc\_endpoints\_sg](#module\_vpc\_endpoints\_sg) | terraform-aws-modules/security-group/aws | ~> 5.0 |
Expand All @@ -49,9 +49,9 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/
| [aws_secretsmanager_secret_version.grafana](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/secretsmanager_secret_version) | resource |
| [kubernetes_cluster_role.spark_role](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/cluster_role) | resource |
| [kubernetes_cluster_role_binding.spark_role_binding](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/cluster_role_binding) | resource |
| [kubernetes_namespace_v1.spark_team_a](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace_v1) | resource |
| [kubernetes_secret_v1.spark_team_a](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/secret_v1) | resource |
| [kubernetes_service_account_v1.spark_team_a](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service_account_v1) | resource |
| [kubernetes_namespace_v1.spark_team](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/namespace_v1) | resource |
| [kubernetes_secret_v1.spark_team](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/secret_v1) | resource |
| [kubernetes_service_account_v1.spark_team](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service_account_v1) | resource |
| [random_password.grafana](https://registry.terraform.io/providers/hashicorp/random/3.3.2/docs/resources/password) | resource |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
Expand All @@ -67,7 +67,7 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.28"` | no |
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.30"` | no |
| <a name="input_eks_data_plane_subnet_secondary_cidr"></a> [eks\_data\_plane\_subnet\_secondary\_cidr](#input\_eks\_data\_plane\_subnet\_secondary\_cidr) | Secondary CIDR blocks. 32766 IPs per Subnet per Subnet/AZ for EKS Node and Pods | `list(string)` | <pre>[<br> "100.64.0.0/17",<br> "100.64.128.0/17"<br>]</pre> | no |
| <a name="input_enable_amazon_prometheus"></a> [enable\_amazon\_prometheus](#input\_enable\_amazon\_prometheus) | Enable AWS Managed Prometheus service | `bool` | `true` | no |
| <a name="input_enable_vpc_endpoints"></a> [enable\_vpc\_endpoints](#input\_enable\_vpc\_endpoints) | Enable VPC Endpoints | `bool` | `false` | no |
Expand Down
332 changes: 161 additions & 171 deletions analytics/terraform/spark-k8s-operator/addons.tf
Original file line number Diff line number Diff line change
@@ -1,173 +1,3 @@
#---------------------------------------------------------------
# IRSA for EBS CSI Driver
#---------------------------------------------------------------
module "ebs_csi_driver_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.34"
role_name_prefix = format("%s-%s-", local.name, "ebs-csi-driver")
attach_ebs_csi_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
}
}
tags = local.tags
}

#---------------------------------------------------------------
# EKS Blueprints Addons
#---------------------------------------------------------------
module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "~> 1.2"

cluster_name = module.eks.cluster_name
cluster_endpoint = module.eks.cluster_endpoint
cluster_version = module.eks.cluster_version
oidc_provider_arn = module.eks.oidc_provider_arn

#---------------------------------------
# Amazon EKS Managed Add-ons
#---------------------------------------
eks_addons = {
aws-ebs-csi-driver = {
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
coredns = {
preserve = true
}
vpc-cni = {
preserve = true
}
kube-proxy = {
preserve = true
}
}

#---------------------------------------
# Kubernetes Add-ons
#---------------------------------------
#---------------------------------------------------------------
# CoreDNS Autoscaler helps to scale for large EKS Clusters
# Further tuning for CoreDNS is to leverage NodeLocal DNSCache -> https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
#---------------------------------------------------------------
enable_cluster_proportional_autoscaler = true
cluster_proportional_autoscaler = {
values = [templatefile("${path.module}/helm-values/coredns-autoscaler-values.yaml", {
target = "deployment/coredns"
})]
description = "Cluster Proportional Autoscaler for CoreDNS Service"
}

#---------------------------------------
# Metrics Server
#---------------------------------------
enable_metrics_server = true
metrics_server = {
values = [templatefile("${path.module}/helm-values/metrics-server-values.yaml", {})]
}

#---------------------------------------
# Cluster Autoscaler
#---------------------------------------
enable_cluster_autoscaler = true
cluster_autoscaler = {
values = [templatefile("${path.module}/helm-values/cluster-autoscaler-values.yaml", {
aws_region = var.region,
eks_cluster_id = module.eks.cluster_name
})]
}

#---------------------------------------
# Karpenter Autoscaler for EKS Cluster
#---------------------------------------
enable_karpenter = true
karpenter_enable_spot_termination = true
karpenter_node = {
iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
}
karpenter = {
chart_version = "v0.34.0"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
}

#---------------------------------------
# CloudWatch metrics for EKS
#---------------------------------------
enable_aws_cloudwatch_metrics = true
aws_cloudwatch_metrics = {
values = [templatefile("${path.module}/helm-values/aws-cloudwatch-metrics-values.yaml", {})]
}

#---------------------------------------
# AWS for FluentBit - DaemonSet
#---------------------------------------
enable_aws_for_fluentbit = true
aws_for_fluentbit_cw_log_group = {
use_name_prefix = false
name = "/${local.name}/aws-fluentbit-logs" # Add-on creates this log group
retention_in_days = 30
}
aws_for_fluentbit = {
s3_bucket_arns = [
module.s3_bucket.s3_bucket_arn,
"${module.s3_bucket.s3_bucket_arn}/*"
]
values = [templatefile("${path.module}/helm-values/aws-for-fluentbit-values.yaml", {
region = local.region,
cloudwatch_log_group = "/${local.name}/aws-fluentbit-logs"
s3_bucket_name = module.s3_bucket.s3_bucket_id
cluster_name = module.eks.cluster_name
})]
}

enable_aws_load_balancer_controller = true
aws_load_balancer_controller = {
chart_version = "1.5.4"
}

enable_ingress_nginx = true
ingress_nginx = {
version = "4.5.2"
values = [templatefile("${path.module}/helm-values/nginx-values.yaml", {})]
}

#---------------------------------------
# Prommetheus and Grafana stack
#---------------------------------------
#---------------------------------------------------------------
# Install Kafka Monitoring Stack with Prometheus and Grafana
# 1- Grafana port-forward `kubectl port-forward svc/kube-prometheus-stack-grafana 8080:80 -n kube-prometheus-stack`
# 2- Grafana Admin user: admin
# 3- Get admin user password: `aws secretsmanager get-secret-value --secret-id <output.grafana_secret_name> --region $AWS_REGION --query "SecretString" --output text`
#---------------------------------------------------------------
enable_kube_prometheus_stack = true
kube_prometheus_stack = {
values = [
var.enable_amazon_prometheus ? templatefile("${path.module}/helm-values/kube-prometheus-amp-enable.yaml", {
region = local.region
amp_sa = local.amp_ingest_service_account
amp_irsa = module.amp_ingest_irsa[0].iam_role_arn
amp_remotewrite_url = "https://aps-workspaces.${local.region}.amazonaws.com/workspaces/${aws_prometheus_workspace.amp[0].id}/api/v1/remote_write"
amp_url = "https://aps-workspaces.${local.region}.amazonaws.com/workspaces/${aws_prometheus_workspace.amp[0].id}"
}) : templatefile("${path.module}/helm-values/kube-prometheus.yaml", {})
]
chart_version = "48.1.1"
set_sensitive = [
{
name = "grafana.adminPassword"
value = data.aws_secretsmanager_secret_version.admin_password_version.secret_string
}
],
}

tags = local.tags
}

#---------------------------------------------------------------
# Data on EKS Kubernetes Addons
#---------------------------------------------------------------
Expand Down Expand Up @@ -471,7 +301,8 @@ module "eks_data_addons" {
#---------------------------------------------------------------
enable_spark_operator = true
spark_operator_helm_config = {
values = [templatefile("${path.module}/helm-values/spark-operator-values.yaml", {})]
version = "1.4.2"
values = [templatefile("${path.module}/helm-values/spark-operator-values.yaml", {})]
}

#---------------------------------------------------------------
Expand Down Expand Up @@ -509,6 +340,165 @@ module "eks_data_addons" {

}

#---------------------------------------------------------------
# IRSA for EBS CSI Driver
#---------------------------------------------------------------
module "ebs_csi_driver_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.34"
role_name_prefix = format("%s-%s-", local.name, "ebs-csi-driver")
attach_ebs_csi_policy = true
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
}
}
tags = local.tags
}

#---------------------------------------------------------------
# EKS Blueprints Addons
#---------------------------------------------------------------
module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "~> 1.2"

cluster_name = module.eks.cluster_name
cluster_endpoint = module.eks.cluster_endpoint
cluster_version = module.eks.cluster_version
oidc_provider_arn = module.eks.oidc_provider_arn

#---------------------------------------
# Amazon EKS Managed Add-ons
#---------------------------------------
eks_addons = {
aws-ebs-csi-driver = {
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
coredns = {
preserve = true
}
vpc-cni = {
preserve = true
}
kube-proxy = {
preserve = true
}
}

#---------------------------------------
# Metrics Server
#---------------------------------------
enable_metrics_server = true
metrics_server = {
values = [templatefile("${path.module}/helm-values/metrics-server-values.yaml", {})]
}

#---------------------------------------
# Cluster Autoscaler
#---------------------------------------
enable_cluster_autoscaler = true
cluster_autoscaler = {
values = [templatefile("${path.module}/helm-values/cluster-autoscaler-values.yaml", {
aws_region = var.region,
eks_cluster_id = module.eks.cluster_name
})]
}

#---------------------------------------
# Karpenter Autoscaler for EKS Cluster
#---------------------------------------
enable_karpenter = true
karpenter_enable_spot_termination = true
karpenter_node = {
iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
}
karpenter = {
chart_version = "v0.34.0"
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
}

#---------------------------------------
# CloudWatch metrics for EKS
#---------------------------------------
enable_aws_cloudwatch_metrics = true
aws_cloudwatch_metrics = {
values = [templatefile("${path.module}/helm-values/aws-cloudwatch-metrics-values.yaml", {})]
}

#---------------------------------------
# AWS for FluentBit - DaemonSet
#---------------------------------------
enable_aws_for_fluentbit = true
aws_for_fluentbit_cw_log_group = {
use_name_prefix = false
name = "/${local.name}/aws-fluentbit-logs" # Add-on creates this log group
retention_in_days = 30
}
aws_for_fluentbit = {
s3_bucket_arns = [
module.s3_bucket.s3_bucket_arn,
"${module.s3_bucket.s3_bucket_arn}/*"
]
values = [templatefile("${path.module}/helm-values/aws-for-fluentbit-values.yaml", {
region = local.region,
cloudwatch_log_group = "/${local.name}/aws-fluentbit-logs"
s3_bucket_name = module.s3_bucket.s3_bucket_id
cluster_name = module.eks.cluster_name
})]
}

enable_aws_load_balancer_controller = true
aws_load_balancer_controller = {
chart_version = "1.5.4"
set = [{
name = "enableServiceMutatorWebhook"
value = "false"
}]
}

enable_ingress_nginx = true
ingress_nginx = {
version = "4.5.2"
values = [templatefile("${path.module}/helm-values/nginx-values.yaml", {})]
}

#---------------------------------------
# Prommetheus and Grafana stack
#---------------------------------------
#---------------------------------------------------------------
# Install Kafka Monitoring Stack with Prometheus and Grafana
# 1- Grafana port-forward `kubectl port-forward svc/kube-prometheus-stack-grafana 8080:80 -n kube-prometheus-stack`
# 2- Grafana Admin user: admin
# 3- Get admin user password: `aws secretsmanager get-secret-value --secret-id <output.grafana_secret_name> --region $AWS_REGION --query "SecretString" --output text`
#---------------------------------------------------------------
enable_kube_prometheus_stack = true
kube_prometheus_stack = {
values = [
var.enable_amazon_prometheus ? templatefile("${path.module}/helm-values/kube-prometheus-amp-enable.yaml", {
region = local.region
amp_sa = local.amp_ingest_service_account
amp_irsa = module.amp_ingest_irsa[0].iam_role_arn
amp_remotewrite_url = "https://aps-workspaces.${local.region}.amazonaws.com/workspaces/${aws_prometheus_workspace.amp[0].id}/api/v1/remote_write"
amp_url = "https://aps-workspaces.${local.region}.amazonaws.com/workspaces/${aws_prometheus_workspace.amp[0].id}"
}) : templatefile("${path.module}/helm-values/kube-prometheus.yaml", {})
]
chart_version = "48.1.1"
set_sensitive = [
{
name = "grafana.adminPassword"
value = data.aws_secretsmanager_secret_version.admin_password_version.secret_string
}
],
}

tags = local.tags
}

#---------------------------------------------------------------
# S3 bucket for Spark Event Logs and Example Data
#---------------------------------------------------------------
Expand Down
Loading
Loading