Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize volume by changing pvc size if enabled in config. #958

Merged
merged 17 commits into from
Jul 3, 2020

Conversation

yanchenko-igor
Copy link
Contributor

Hi, we are using rook+ceph as a storage for postgres-operator, it takes care of resizing volumes when we change the new size in pvc, I am not sure if this is a good implementation, but it works, please let me know if it should be done in another way.

@frittentheke
Copy link
Contributor

frittentheke commented May 6, 2020

@yanchenko-igor I started a discussion about switching to Kubernetes (or rather CSI) to be used for volume management in issue #460. I suppose your PR that updates the PVC config is working in that direction - if the CSI driver / config used supports it, this should cause the volume to be resized and the FS to be grown, right?

@yanchenko-igor
Copy link
Contributor Author

@frittentheke yes, I guess in some cases it might require restarting pod, so we might need to add some checks and schedule the restart of the pod, in my case it works without restarting the pod. Could you please test the code in your environment and report if it works for you?

@frittentheke
Copy link
Contributor

frittentheke commented May 6, 2020

@yanchenko-igor I shall do that yes.

But one remark regarding your PR: I do not think resizing the "claim" should be a fallback type of handling things. With CSI being able to resize volumes this should be the default. I would add a switch to the operator to enable PVC resizing or not and then use either and in case of PVC-resizing (CSI handling things) to fully refrain from touching the EBS / AWS API code paths.

@yanchenko-igor
Copy link
Contributor Author

@frittentheke I like the idea with a switch, but I'm not sure what would be the best place for it, do you have any suggestions?

@frittentheke
Copy link
Contributor

@yanchenko-igor I suggest to add an option in the operator config like I did here: 1aef4b2

So maybe a boolean to "enable_storage_resize" in the kubernetes section like here

or even a variable to allow a mode like "ebs" or "pvc" or so to be set. Honestly I believe in the future the CSI / PVC resize approach is the only one left as all storage types will be managed via CSI. So a switch to enable and disable the support for resizing of storage in the operator might be enough.

But that is up the maintainers like @FxKu to decide.

@FxKu FxKu added this to the 1.6 milestone May 7, 2020
@yanchenko-igor yanchenko-igor changed the title Try to resize pvc if resizing pv has failed Resize volume by changing pvc size if enabled in config. May 7, 2020
@ReSearchITEng
Copy link
Contributor

ReSearchITEng commented May 7, 2020

Hi @yanchenko-igor,
like @frittentheke , I would also suggest make it more generic, and instead of enable_pvc_resize, I suggest enable_storage_resize which can have values like: "pvc", "ebs" or "" to disable all pv checks.

In syncVolumes, I would start with this:

	if c.OpConfig.EnableStorageResize == "" {
		c.logger.Infof("storage resize is disabled (enable_storage_resize is empty). Skipping volume sync.")
		return nil
	}

and then check if it's =="pvc" or "ebs" and act accordingly.

  1. I suggest: when ==pvc, I suggest we also change volumesNeedResizing
    When =="pvc" we should not check the PV, but the pvc size. It's important as users don't have pers for PVs, but they do have for PVCs.
    Only when ==ebs we keep volumesNeedResizing in its current format, which checks PVs.

@frittentheke
Copy link
Contributor

frittentheke commented May 8, 2020

Hi @yanchenko-igor,
like @frittentheke , I would also suggest make it more generic, and instead of enable_pvc_resize, I suggest enable_storage_resize which can have values like: "pvc", "ebs" or "" to disable all pv checks.

@ReSearchITEng revisiting my reasoning about CSI being the Kubernetes way to "talk to storage" I think a switch is enough to either enable / allow storage resizing or not in the operator. I strongly believe, which was my reason to open issue #460 in the first place, the communication with the AWS API to resize the EBS volume was necessary and enabled the operator to be more than Kubernetes allowed it to be, but it will ultimately be replaced by CSI and that code path and libraries can be removed.

The alternative of a "flavor" would be opening up to add more API bindings (GCP, Azure, Alibaba, ...) - exactly what CSI and the Kubernetes PVC concept is there to avoid. Kubernetes wants to be your adapter to the resources - including storage.

But again, this is up to the maintainers of this project to decide where they want (and need) to go.

@ReSearchITEng
Copy link
Contributor

Right, @FxKu and others will decide on what they want to support aws this way or not. It's not my main concern here.
What I want to stress is that when we use PVC we should modify volumesNeedResizing and compare PVCs (as opposite to comparing PVs, as volumesNeedResizing is currently working). My conservative proposal is to create another method: volumeClaimsNeedResizing which should be used instead (when pvc is the method).
This is also "healthy" as we compare what we change, as well, equally important, we do not require perms which normally the operator does not have: access to PVs (not even get/list/watch).

@Jan-M
Copy link
Member

Jan-M commented May 27, 2020

I like the request not to make this boolean and I was kind of looking forward to seeing this happen the moment K8s supports this nicely on statefulset level, so one would not really need to manually deal with anything as all of this is just work arounds.

If this is however helpful and needed while K8s moves forward the non boolean version sounds most attractive.

Thanks for everyone putting working into this so far.

@FxKu
Copy link
Member

FxKu commented Jun 12, 2020

Let me check with Jan, if we go this direction. If yes I would ask to add examples in manifests (configmap, default-operator-configuration) and values files of the helm charts. Plus, this options is not yet covered by tests (unit, e2e) and docs.

@ReSearchITEng
Copy link
Contributor

ReSearchITEng commented Jun 12, 2020

I can confirm it solves #1017 -> the OpenShift PV permissions issue (it overcomes the need of access PVs).
It also seems aligned with @frittentheke and @Jan-M comments, to have not Boolean, but options: pvc,ebs,off.

OpenShift can use either "off" or "pvc" (where pvc resize is supported). None of these 2 options cause perms/sync problems.

Only one note: I would default to pvc (or even off) instead of ebs - like @frittentheke also suggested: #958 (comment)

go.mod Outdated
@@ -17,7 +17,6 @@ require (
k8s.io/api v0.18.2
k8s.io/apiextensions-apiserver v0.18.2
k8s.io/apimachinery v0.18.2
k8s.io/client-go v11.0.0+incompatible
k8s.io/client-go v0.18.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required? Can you maybe revert changes from the go.mod file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't build for me locally without this. but it works for you I can remove it.

Copy link
Contributor Author

@yanchenko-igor yanchenko-igor Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the error I receive when I revert the change and try to build.

vendor/k8s.io/client-go/rest/request.go:598:31: not enough arguments in call to watch.NewStreamWatcher
        have (*versioned.Decoder)
        want (watch.Decoder, watch.Reporter)
make: *** [Makefile:54: linux] Error 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, I receive the same error using master branch without any of my changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that running make deps also creates changes. Let me fix this in a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally offtopic, but maybe you can help me with tests, I receive this error when I run make test:

...
GO111MODULE=on go test ./... 
# github.com/zalando/postgres-operator/github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:35:11: undefined: AWSGCPConfiguration
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:41:11: undefined: AWSGCPConfiguration
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:51:11: undefined: AdditionalVolume
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:63:11: undefined: AdditionalVolume
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:73:11: undefined: CloneDescription
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:84:11: undefined: CloneDescription
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:94:11: undefined: ConnectionPooler
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:111:11: undefined: ConnectionPooler
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:121:11: undefined: ConnectionPoolerConfiguration
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:137:11: undefined: ConnectionPoolerConfiguration
github.com/zalando/postgres-operator/pkg/apis/acid.zalan.do/v1/zz_generated.deepcopy.go:137:11: too many errors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run ./hack/update-codegen.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried that, but make test runs it as well under the hood, so it doesn't help.

@FxKu
Copy link
Member

FxKu commented Jun 25, 2020

This PR looks complete to me. Thanks @yanchenko-igor for your continuous work on it. I'm just fighting with the name of the option. Usually, we use enable_... for bool flags, but this has now become an enum field. Maybe something like storage_resize_mode? I think, I would also prefer the word volume over storage. What do you think @Jan-M @sdudoladov @erthalion ?

Plus, maybe off is the default to use. It's a breaking change, but I guess would not cause damage.

@FxKu
Copy link
Member

FxKu commented Jun 25, 2020

👍

return fmt.Errorf("could not compare size of the volume claims: %v", err)
}
if !act {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any good reason not to log something here?

pkg/util/config/config.go Outdated Show resolved Hide resolved
@FxKu
Copy link
Member

FxKu commented Jul 1, 2020

@yanchenko-igor note, I've changed the default to ebs again. Better not surprise existing users with a resizing feature that stops working. Added a comment about OpenShift. For the helm charts, I think we will need some more conditionals like in #1037, but that's content for another PR.

@yanchenko-igor yanchenko-igor requested a review from Jan-M July 1, 2020 09:00
@FxKu
Copy link
Member

FxKu commented Jul 1, 2020

👍

1 similar comment
@Jan-M
Copy link
Member

Jan-M commented Jul 3, 2020

👍

@FxKu FxKu merged commit 88735a7 into zalando:master Jul 3, 2020
@senthilcodr
Copy link

@yanchenko-igor, thanks for this PR. Is it a good idea to update this documentation as well? https://github.com/zalando/postgres-operator/blob/master/docs/user.md#increase-volume-size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants