Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple atlantis instances - Installation using Helm chart #3795

Open
bsvartz opened this issue Sep 27, 2023 · 13 comments
Open

Multiple atlantis instances - Installation using Helm chart #3795

bsvartz opened this issue Sep 27, 2023 · 13 comments
Labels
feature New functionality/enhancement Stale

Comments

@bsvartz
Copy link

bsvartz commented Sep 27, 2023

Hey,

We got 30 cloud clusters (Different environments) with ~700 resources.
Managing the following environments (parallel - without execution_order_group) with only one atlantis pod cause to timeouts when trying to get to api's used by TF (ex. datadog, timescale, etc.).
In addition, the plans and applies are getting slower with each new environment that we add.

We are installing atalntis on GKE Cluster using this Chart:
https://github.com/runatlantis/helm-charts

Based on this closed issue - #1155 i thought that i will be able to configure the chart with multiple instance - using Redis and Shared disks.

I configured the chart to use redis with lockingDbType, redis.db, redis.host but the atlantis chart is creating statefulset that using VolumeClaimTemplate create one pvc per pod and than i can't share volumes between the containers and pods on the statefulset. That cause to pods to work with their own pvc - without syncing data and than the .tfstate files, pull requests data, etc. are not known to each pod in the statefulset - the pods are not really working together.

I checked the chart for more options and couldnt find any solution to share disks between the pods and to work with multiple atlantis intances.

Is it even possible? if not - can you please add it to Chart?

Thanks!!!

@bsvartz bsvartz added the feature New functionality/enhancement label Sep 27, 2023
@jamengual
Copy link
Contributor

@GMartinez-Sisti do you know if this is possible?

@GMartinez-Sisti
Copy link
Member

This might be possible to achieve if locking is already supported by Atlantis.

There are a few requirements:

  • Be able to use shared storage. We actually have Move atlantis-data volume to a separate PVC helm-charts#304 that would cover part of this by decoupling the storage from the StatefulSet resource
  • Be able to change accessModes to ReadWriteMany, this is achieved by a simple configuration parameter on the helm chart
  • Use a storage type that supports multiple clients so we can set ReadWriteMany. AWS EBS or Google Cloud Storage won't work for this, something like AWS EFS or GCP Filestore can be used

Hope it helps!

@bsvartz
Copy link
Author

bsvartz commented Sep 28, 2023

@GMartinez-Sisti - seems like its a great first step! - i added in runatlantis/helm-charts#304 some comments. in addition - when it suppose to be merged?

@jamengual - Are you sure that after sharing the disk the atlantis multiple instances will know to work parallel? the atlantis app support this?

Lets say i got external load balancer that points to 5 atlantis pods - if we will share disk it will show the plan / apply on each pod?

Thanks for the quick response!

@jamengual
Copy link
Contributor

no, working on parallel will not work well.
Atlantis was not built to have multiple instances.
It has been extended to have external locking and such ( you will need to use redis) and is possible to run multiple instances that way but there are some caveats.

you can read some issues of people who have tried this to get an idea.

@GMartinez-Sisti
Copy link
Member

GMartinez-Sisti commented Sep 28, 2023

Does it work concurrently then? The hooks will only reach one instance at a time, so I assume they will work on different PRs, and the redis lock is to ensure they don't try to do the same?

@jamengual
Copy link
Contributor

Terraform in itself does not support concurrent plans in one system ( try it in your computer)
so you will have to work those things out, like plugin cache for example.

@GMartinez-Sisti
Copy link
Member

Regarding terraform, I'm aware on that, we also use state locking using dynamodb to ensure no one is working on the same workflows JIC. I meant regarding atlantis, since there is a locking feature for redis, it implies there might be multiple atlantis servers running, so the server needs to ensure no one is trying to work on the same workflows. Right?

@jamengual
Copy link
Contributor

yes that is for the atlantis lock which still is per repo+workspace.

there is a problem with provider cache in TF for parallel runs ( not remote state) that is what I'm referring too.

@jamengual
Copy link
Contributor

#1571

@jamengual
Copy link
Contributor

@GMartinez-Sisti
Copy link
Member

GMartinez-Sisti commented Sep 28, 2023

#1571

Great read, thanks for sharing. So, now I don’t quite understand the initial question you made. Was it just to share the underlying storage, while having only one instance? @jamengual

@jamengual
Copy link
Contributor

the original question for you was related to the shared volume usage in the helm-chart that I do not think is implemented, right?

@GMartinez-Sisti
Copy link
Member

the original question for you was related to the shared volume usage in the helm-chart that I do not think is implemented, right?

Correct. Regarding that, my first reply is still valid.

@dosubot dosubot bot added the Stale label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality/enhancement Stale
Projects
None yet
Development

No branches or pull requests

3 participants