-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object Storage (LEP 20230430) #2136
Conversation
This pull request is now in conflicts. Could you fix it @m-ildefons? 🙏 |
ebd3e0e
to
a82b8aa
Compare
765a552
to
659174d
Compare
This pull request is now in conflicts. Could you fix it @m-ildefons? 🙏 |
25be09a
to
204390f
Compare
f94f721
to
a1daec8
Compare
This pull request is now in conflicts. Could you fix it @m-ildefons? 🙏 |
Fix review remarks Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Remove ingres for the object store UI. This is no longer needed when using the Longhorn UI's nginx as reverse proxy. Fix deleting an object store by removing the finalizer without waiting on resources to be deleted. Since OwnerReferences are used for automatic cleanup, the finalizer can be removed immediately. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Fix websocket controller: Relay secret, object store and settings information as expected by the UI frontend and as used by the object store tab in the UI Error handling: Remove custom error variables from the object store controller, since in Go, errors created with the errors.New function can not be compared with the errors.Is function as expected This mechanism is replaced by just wrapping normal errors with errors.Wrapf Error handling: Fix some conditions in checkDeployment. When a deployment is scaled up, there is a brief period where it has 0 replicas and 0 unavailable replicas. Therefore it is insufficient to just check the count of unavailable replicas to make sure the deployment is ready, the total count of replicas has to checked too. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Expose size and usage information of the longhorn volume associated with an object store via the internal API to the UI. This allows the Longhorn UI to receive information about the actual size and free space in an object store. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Fix event trigger for PVC events. This had misakenly been wired up to call the event handler for Service events, but now it's fixed to call the right event handler. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Fix creating a secret when given credentials from the UI. Instead of over-writing an existing secret or similar antics, generate a name without a conflict for a new secret to use. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Fix volume expansion: - ensure a volume expansion is only tried when the new size is larger than the old size - propagate the new size to the Longhorn volume, thereby expanding it, when an object store is updated Fix update: - Propagate the container image from the UI to the deployment, thereby allowing users to update the s3gw container images from the Longhorn UI even for existing object stores and even to newer versions than those that are originally shipped Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
- Fix s3gw status page container port - Force setting the `.spec.image` and `.spec.uiImage` properties of the object store via webhook on creation - Don't transition the object store to error state when creating resources fails, instead retry - Check PV and Longhorn Volume health - Don't check service health as a K8s Service resource doesn't have any status indicating healthyness or not. It either exists or not, readiness is dealt with on the deployment/pod level Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
- Update CRDs, move size property under .spec, rename .spec.storage to .spec.volumeParameters - Improve webhook to make storage attributes immutable if they don't also change the backing volume - Improve error handling in the controller Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Update CRDs Safe image modifications when updating images Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Fix secret creation with logic that supports reclaiming managed secrets and errors out when user-created secrets are conflicting with to-be-created ones Adapt the kubernetes config map controller to be able to create and manage multiple storage classes. Add default settings to allow the kubernetes config map controller to maanger the stub storage class for the volumes of object stores Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Stop propagating image settings to the object store controller. Applying default image settings to object stores is handled in the webhooks. Therefore the object store controller no longer needs access to this information, besides it would have to be fetched from the appropriate setting anyways. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Fix the race condition on start of a new object store. As long as there is just the object store and no longhorn volume yet some controller needs to be responsible for the object store. Usually it would be the one responsible for the volume, this doesn't exist yet and there are no provisions to track the owner another way. To fix this, just let the controller on the first node handle it. Fix error handling by setting the object store state to error only once and otherwise propagate errors up to the reconcile function. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
- Adjust container launch options for the s3gw to match the new launch options for s3gw v0.23.0 and up - Cleanup access credential secrets when they have been created through the longhorn manager and the object store is deleted Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
- Fix potential nil pointer dereference when receiving API errors other than ErrNotFound - Export Object Store size specs in the prometheus exporter - Send the number of deployed object stores to the telemetry endpoint (if sending telemetry is enabled) longhorn/longhorn#6720 Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
- Add admission webhook for validating object store resources - Force foreground cascading deleteion of object stores. This ensures that the volume and PV are cleaned up by the object store controller and resources aren't leaked before removing the finalizer on the object store. - Consolidate starting state. The object store's state is only set to starting in the `initializeObjectStore` function, not whenever resources are created - Error conditions: All errors except server-side API errors will cause the object store state to become `error`. There is only one place in the code where the state can be set to `error`, which is at the end of the reconcile function. - Telemetry setting: Depending on the Longhorn setting allowing for sending telemetry, the telemetry for the s3gw is switchen on or off. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
- Reorder the health checks during the creation of an object store When creating an object store, the responsibility for managing the resources making up that object store rests with the controller that's also responsible for the longhorn volume. There is a brief moment while the object store is starting when the information which controller is responsible for the longhorn volume has not yet been propagated to all controllers. Therefore the controller must first wait until the longhorn volume is healthy and has an owner before it can determine for sure if it's responsible or not. Only then the other resources can be created without running into race conditions with other controller who think they are responsible. - Clean up utility functions and constants Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
Rebase on master and update vendored modules Fix logic in test script. To ensure only the necessary unit tests are executed (or of nothing has been changed, all unit tests are executed), the logic needs to make sure not to interprete vendored modules as changed source since it will fail to find unit tests for them. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
3046278
to
1a93355
Compare
There is currently nothing planned for that case, since it's unclear to me what situations exactly can occur and I don't expect the manifests to change a lot. Can you describe what situation exactly you're thinking about here? |
// remove the object store resource as well. | ||
_, err = osc.ds.GetService(osc.namespace, store.Name) | ||
if err == nil { | ||
return osc.ds.DeleteService(osc.namespace, store.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that if the osc.ds.DeleteService() returns nil, we will return nil and never requeue the ObjectStore object again until 1 hour resync period hit. Could you take a look at the suggesting modification change below
func (osc *ObjectStoreController) handleTerminating(store *longhorn.ObjectStore) (err error) { | ||
// The resources are created in the order: | ||
// Volume -> PV -> PVC -> Deployment -> Service | ||
// so we tear them down in reverse: | ||
// Service -> Deployment -> PVC -> PV -> Volume | ||
// Once that is done we can remove the finalizer, which allows the K8s API to | ||
// remove the object store resource as well. | ||
_, err = osc.ds.GetService(osc.namespace, store.Name) | ||
if err == nil { | ||
return osc.ds.DeleteService(osc.namespace, store.Name) | ||
} else if !datastore.ErrorIsNotFound(err) { | ||
return err | ||
} | ||
|
||
_, err = osc.ds.GetDeployment(store.Name) | ||
if err == nil { | ||
return osc.ds.DeleteDeployment(store.Name) | ||
} else if !datastore.ErrorIsNotFound(err) { | ||
return err | ||
} | ||
|
||
_, err = osc.ds.GetPersistentVolumeClaim(osc.namespace, store.Name) | ||
if err == nil { | ||
return osc.ds.DeletePersistentVolumeClaim(osc.namespace, store.Name) | ||
} else if !datastore.ErrorIsNotFound(err) { | ||
return err | ||
} | ||
|
||
_, err = osc.ds.GetPersistentVolume(store.Name) | ||
if err == nil { | ||
osc.ds.DeletePersistentVolume(store.Name) | ||
} else if !datastore.ErrorIsNotFound(err) { | ||
return err | ||
} | ||
|
||
_, err = osc.ds.GetVolume(store.Name) | ||
if err == nil { | ||
osc.ds.DeleteVolume(store.Name) | ||
} else if !datastore.ErrorIsNotFound(err) { | ||
return err | ||
} | ||
|
||
// cleanup all secrets with matching labels. | ||
labels := types.GetBaseLabelsForSystemManagedComponent() | ||
labels[types.GetLonghornLabelComponentKey()] = types.LonghornLabelObjectStore | ||
labels[types.GetLonghornLabelKey(types.LonghornLabelObjectStore)] = store.Name | ||
|
||
secrets, err := osc.ds.ListSecretsByLabels(osc.namespace, labels) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
for _, secret := range secrets { | ||
osc.ds.DeleteSecret(osc.namespace, secret.Name) | ||
} | ||
|
||
if len(store.ObjectMeta.Finalizers) != 0 { | ||
return osc.ds.RemoveFinalizerForObjectStore(store) | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggesting to directly delete the object and check the returned error instead of getting the object then delete
func (osc *ObjectStoreController) handleTerminating(store *longhorn.ObjectStore) (err error) { | |
// The resources are created in the order: | |
// Volume -> PV -> PVC -> Deployment -> Service | |
// so we tear them down in reverse: | |
// Service -> Deployment -> PVC -> PV -> Volume | |
// Once that is done we can remove the finalizer, which allows the K8s API to | |
// remove the object store resource as well. | |
_, err = osc.ds.GetService(osc.namespace, store.Name) | |
if err == nil { | |
return osc.ds.DeleteService(osc.namespace, store.Name) | |
} else if !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
_, err = osc.ds.GetDeployment(store.Name) | |
if err == nil { | |
return osc.ds.DeleteDeployment(store.Name) | |
} else if !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
_, err = osc.ds.GetPersistentVolumeClaim(osc.namespace, store.Name) | |
if err == nil { | |
return osc.ds.DeletePersistentVolumeClaim(osc.namespace, store.Name) | |
} else if !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
_, err = osc.ds.GetPersistentVolume(store.Name) | |
if err == nil { | |
osc.ds.DeletePersistentVolume(store.Name) | |
} else if !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
_, err = osc.ds.GetVolume(store.Name) | |
if err == nil { | |
osc.ds.DeleteVolume(store.Name) | |
} else if !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
// cleanup all secrets with matching labels. | |
labels := types.GetBaseLabelsForSystemManagedComponent() | |
labels[types.GetLonghornLabelComponentKey()] = types.LonghornLabelObjectStore | |
labels[types.GetLonghornLabelKey(types.LonghornLabelObjectStore)] = store.Name | |
secrets, err := osc.ds.ListSecretsByLabels(osc.namespace, labels) | |
if err != nil { | |
return err | |
} | |
for _, secret := range secrets { | |
osc.ds.DeleteSecret(osc.namespace, secret.Name) | |
} | |
if len(store.ObjectMeta.Finalizers) != 0 { | |
return osc.ds.RemoveFinalizerForObjectStore(store) | |
} | |
return nil | |
} | |
func (osc *ObjectStoreController) handleTerminating(store *longhorn.ObjectStore) (err error) { | |
// The resources are created in the order: | |
// Volume -> PV -> PVC -> Deployment -> Service | |
// so we tear them down in reverse: | |
// Service -> Deployment -> PVC -> PV -> Volume | |
// Once that is done we can remove the finalizer, which allows the K8s API to | |
// remove the object store resource as well. | |
if err := osc.ds.DeleteService(osc.namespace, store.Name); err != nil && !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
if err := osc.ds.DeleteDeployment(store.Name); err != nil && !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
if err := osc.ds.DeletePersistentVolumeClaim(osc.namespace, store.Name); err != nil && !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
if err := osc.ds.DeletePersistentVolume(store.Name); err != nil && !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
if err := osc.ds.DeleteVolume(store.Name); err != nil && !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
// cleanup all secrets with matching labels. | |
labels := types.GetBaseLabelsForSystemManagedComponent() | |
labels[types.GetLonghornLabelComponentKey()] = types.LonghornLabelObjectStore | |
labels[types.GetLonghornLabelKey(types.LonghornLabelObjectStore)] = store.Name | |
secrets, err := osc.ds.ListSecretsByLabels(osc.namespace, labels) | |
if err != nil { | |
return err | |
} | |
for _, secret := range secrets { | |
if err := osc.ds.DeleteSecret(osc.namespace, secret.Name); err != nil && !datastore.ErrorIsNotFound(err) { | |
return err | |
} | |
} | |
return osc.ds.RemoveFinalizerForObjectStore(store) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM
Thanks @shuo-wu for the good catch. I think we should create a new ticket for handling this upgrading design. One of the most common use-case is that:
Another use case is that:
|
I don't think the resource upgrade should rely on the backup & restore feature. I am thinking if we can re-deploy everything if users try to upgrade the ObjectStore.spec.image. Since the deployment does not support rolling upgrade, re-deploying those components should be fine. |
Simplify deletes by removing preceeding gets and dealing with errors. Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
This pull request is now in conflicts. Could you fix it @m-ildefons? 🙏 |
Implementation of an
ObjectStore
CRD and an object storage controller.This controller watches for
ObjectStore
resources on the K8s API and orchestrates instances of the s3gw and its UI on top of a Longhorn volume. This allows an operator to configure object storage on a Longhorn installation.Related: longhorn/longhorn#6640
Related: longhorn/longhorn-ui#649
LEP: longhorn/longhorn#5832