-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler helm charts: always create a PVC #7977
Scheduler helm charts: always create a PVC #7977
Conversation
Signed-off-by: Elena Kolevska <elena@kolevska.com>
Signed-off-by: Elena Kolevska <elena@kolevska.com>
7303f89
to
eb7d647
Compare
Signed-off-by: Artur Souza <asouza.pro@gmail.com> Change default storage size of scheduler to 1Gi Signed-off-by: Artur Souza <asouza.pro@gmail.com>
ae03059
to
43ac16a
Compare
Signed-off-by: Artur Souza <asouza.pro@gmail.com>
Signed-off-by: Artur Souza <asouza.pro@gmail.com>
b8edff0
to
fa40505
Compare
Signed-off-by: Artur Souza <asouza.pro@gmail.com>
Signed-off-by: Artur Souza <asouza.pro@gmail.com>
7ae7a78
to
49fec62
Compare
Signed-off-by: Artur Souza <asouza.pro@gmail.com>
49fec62
to
33baea0
Compare
volumeClaimTemplates: | ||
- metadata: | ||
name: dapr-scheduler-data-dir | ||
spec: | ||
accessModes: [ "ReadWriteOnce" ] | ||
{{- if .Values.cluster.storageClassName }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a problem.
On various cloud providers, persistent volumes are add-ons that customers are charged for.
Is there a way to not have a persistent volume? Could this be made optional, like for the placement service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should definitely have the ability to not require a persistent volume claim, and personally would assume this to be the default as it was previously. This also changes the per-reqs for installing Dapr where Storage class providers are also not always available on all Kubernetes platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed because etcd requires durable storage when running in HA mode, and unfortunately, this is a strong requirement. If a cluster was created in HA mode without a custom storageClass and one of the pods was restarted for any reason, the etcd server couldn't rejoin the cluster because it couldn't find its data directory (which had been created on the pod's ephemeral drive by default).
This problem doesn't happen when running in single-node (non-HA) clusters because when the data dir is lost, all information about previous state is lost with it, so after restart the node will think it's booting up in a new cluster.
So when running in HA mode we have to have durable storage if we want to keep the nodes' identities, etcd was designed that way. It's not possible for an etcd node to rejoin a cluster with a previously known identity if it doesn't have the matching data-dir.
So while it is technically possible for us to remove any durable storage requirement, it can only work for non-HA. And we need to be aware that we open up a big possibility for data loss, so that would need to be made abundantly clear in the docs. If a single-node cluster is restarted, all the scheduled jobs and reminders data will be gone, together with the pod's ephemeral drive. That is a very final and irreversible event, compared to helm chart errors on install or upgrade, so I guess we have to find the right balance point between ease of use and reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed because etcd requires durable storage when running in HA mode, and unfortunately, this is a strong requirement. If a cluster was created in HA mode without a custom storageClass and one of the pods was restarted for any reason, the etcd server couldn't rejoin the cluster because it couldn't find its data directory
This is one of the (various) reasons why I was strongly advising not using etcd the entire time :)
Aside from being costly on many cloud providers, this seems very fragile too. Ideally, the etcd cluster should have a way to allow nodes to join and leave. Otherwise, what happens if the volume gets lost, even transiently (e.g. the physical node that has the volume goes offline temporarily)?
From an ops standpoint, Dapr cluster admins now need to be concerned with a stateful control plane service too, which must be run within the cluster itself (unlike databases/Redis/etc which many (most?) Dapr users leverage as PaaS services).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed because etcd requires durable storage when running in HA mode, and unfortunately, this is a strong requirement. If a cluster was created in HA mode without a custom storageClass and one of the pods was restarted for any reason, the etcd server couldn't rejoin the cluster because it couldn't find its data directory
This is one of the (various) reasons why I was strongly advising not using etcd the entire time :)
Aside from being costly on many cloud providers, this seems very fragile too. Ideally, the etcd cluster should have a way to allow nodes to join and leave. Otherwise, what happens if the volume gets lost, even transiently (e.g. the physical node that has the volume goes offline temporarily)?
From an ops standpoint, Dapr cluster admins now need to be concerned with a stateful control plane service too, which must be run within the cluster itself (unlike databases/Redis/etc which many (most?) Dapr users leverage as PaaS services).
The counter argument is that state has become a critical point for Dapr, not only for actors / actor reminders but now also for workflows, jobs and upcoming planned features like delayed pub/sub etc - and that at this point state needs to become a first class citizen in Dapr where the project is in full control of how its being operated, maintained and observed in order to guarantee the most consistent experience in terms of performance, security and behavior. Guaranteeing consistency for multiple variants of PaaS services in different clouds of different distributions and different technologies is very difficult and while it can work well for generic APIs like state store (with more or less success), its unlikely to work at such a level for state that underpins Dapr's own APIs. Remaining in full control of managing state is in my opinion worth the trade off of a StatefulSet with a PVC, which is in itself a concept most Kubernetes operators are familiar with anyway when running stateful workloads.
* Always try to create a PVC Signed-off-by: Elena Kolevska <elena@kolevska.com> * better empty check Signed-off-by: Elena Kolevska <elena@kolevska.com> * Reduce disk size for scheduler Signed-off-by: Artur Souza <asouza.pro@gmail.com> Change default storage size of scheduler to 1Gi Signed-off-by: Artur Souza <asouza.pro@gmail.com> * Reduce scheduler storageSize again Signed-off-by: Artur Souza <asouza.pro@gmail.com> * dapr_scheduler.cluster.storageSize=30Mi Signed-off-by: Artur Souza <asouza.pro@gmail.com> * Changing storage size for redis and scheduler. Signed-off-by: Artur Souza <asouza.pro@gmail.com> * Reduce volume size for kafka, postgres and rabbitmq Signed-off-by: Artur Souza <asouza.pro@gmail.com> * Try to free up more disk space Signed-off-by: Artur Souza <asouza.pro@gmail.com> --------- Signed-off-by: Elena Kolevska <elena@kolevska.com> Signed-off-by: Artur Souza <asouza.pro@gmail.com> Co-authored-by: Artur Souza <asouza.pro@gmail.com> Signed-off-by: Jake Engelberg <jake@diagrid.io>
The scheduler needs a durable data storage to store all the jobs data, so we must provide a PV for that purpose.
If a storageClass is not provided, the volume will be created on the default storageClass.
If the cluster does not have a default storage class specified, the scheduler pods will not start up and the helm install will fail.