From 41e946451e6e0600fa524bff9058076f4115d521 Mon Sep 17 00:00:00 2001 From: Ankit Gohil Date: Tue, 9 May 2023 14:32:58 -0700 Subject: [PATCH] Add orphan volume documentation --- README.md | 11 ++++-- docs/book/deployment/upgrade.md | 43 ++++++++++++++++++++++ docs/book/features/orphan_volumes.md | 53 ++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 2 deletions(-) create mode 100644 docs/book/deployment/upgrade.md create mode 100644 docs/book/features/orphan_volumes.md diff --git a/README.md b/README.md index 2db29ac..4cc1345 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ CNS Manager exposes APIs that can be invoked by authorized users to detect issue This repository provides artifacts for deploying CNS manager in vanilla Kubernetes cluster, as well as the client sdk to invoke its endpoints. -## Deploy cns-manager +## Deploying cns-manager CNS manager needs to be deployed in one of the Kubernetes clusters in the vCenter. If there are multiple Kubernetes clusters in a vCenter, it's recommended that it be deployed in a dedicated admin-managed cluster, but it's not a must. However, the admin should be responsible to secure the Kubernetes cluster where CNS manager is deployed since it will have credentials to vCenter and the Kubernetes cluster. Also if you want CNS manager to be highly available, deploy it on a Kubernetes cluster that's highly available itself. @@ -55,7 +55,14 @@ curl -X 'POST' "http://CNS-MANAGER-ENDPOINT/1.0.0/registercluster?csiDriverSecre ``` * Once the cluster is registered, you may delete this file from the machine. +**Note**: If a registered cluster later gets decommissioned or deleted from the vCenter, don't forget to deregister it from CNS manager as well. This will ensure a smooth execution of functionalities offered through CNS manager. + +## Upgrading cns-manager +See the [upgrade instructions](docs/book/deployment/upgrade.md) if you're upgrading previously deployed cns-manager instance to a newer release. ## Functionalities currently offered through cns-manager * **Storage vMotion for CNS volumes** -This feature allows migrating volumes from one datastore to another. Read [here](docs/book/features/storage_vmotion.md) for more details about the feature. +This feature allows migrating volumes from one datastore to another. Read [here](docs/book/features/storage_vmotion.md) for more details about this feature. + +* **Orphan volumes detection & deletion** +This feature allows detecting/deleting orphan volumes that are not being used in any of the registered Kubernetes clusters on the vCenter. Read [here](docs/book/features/orphan_volumes.md) for more details about this feature. diff --git a/docs/book/deployment/upgrade.md b/docs/book/deployment/upgrade.md new file mode 100644 index 0000000..7c1cb73 --- /dev/null +++ b/docs/book/deployment/upgrade.md @@ -0,0 +1,43 @@ +## Upgrading cns-manager + +The easiest way to upgrade cns-manager would be to completely undeploy the current version, checkout the targetted release & use deployment artifacts from the that release. The steps to deploy are already available [here](../../../README.md#deploying-cns-manager). + +But if you want to preserve some of the earlier configurations such as oAuth2 configuration, clusters that were already registered etc., you can perform below steps: + +**1.** Update Swagger config to reflect the newly added API endpoints(if any) in Swagger UI. +This can be done using following commands. + +``` +> git checkout +> kubectl -n delete configmap swagger-api +> export CNS_MANAGER_ENDPOINT= +> sed "s/%CNS_MANAGER_ENDPOINT%/$CNS_MANAGER_ENDPOINT/g" deploy/swagger-template.yaml > swagger.yaml +> kubectl -n create configmap swagger-api --from-file=swagger.yaml +> rm swagger.yaml +``` + +Here `CNS_MANAGER_ENDPOINT` will be the endpoint over which CNS manager service is accessible (<> or FQDN set during deployment). + +**2.** Update nginx config to reflect any changes done for nginx proxy. +``` +> git checkout +> kubectl -n delete configmap nginx-conf +> kubectl -n create configmap nginx-conf --from-file=/nginx.conf +``` + +Here `auth-folder` is the folder corresponding to your deployment type - `deploy/basic-auth` or `deploy/oauth2`. + +**3.** Check if orphan volume auto-deletion is disabled. It's recommended to keep it disabled until you fully understand its usage (Read the [orphan volume feature documentation](../features/orphan_volumes.md#setting-up-auto-monitoring-for-orphan-volumes-deletion) for details). + +The desired value can be set in `auto-delete-ov` field in `cnsmanager-config` configmap. + +``` +kubectl edit configmap cnsmanager-config -n +``` + +**4.** Update the new release image in cns-manager deployment. For instance, for upgrading to release 0.2.0: +``` +kubectl set image deployment/cns-manager cns-manager=projects.registry.vmware.com/cns_manager/cns-manager:r0.2.0 -n +``` + +This will restart the deployment including updating the nginx config as well as new API endpoints in Swagger UI. \ No newline at end of file diff --git a/docs/book/features/orphan_volumes.md b/docs/book/features/orphan_volumes.md new file mode 100644 index 0000000..22bb7c9 --- /dev/null +++ b/docs/book/features/orphan_volumes.md @@ -0,0 +1,53 @@ +## Orphan volumes detection and clean-up +Orphan volumes are vSphere volumes that are present on a vSphere datastore but there is no corresponding PersistentVolume in Kubernetes clusters on the vCenter. + +Orphan volumes are often created when CNS solution creates more than one vSphere volume for a Persistent Volume in the Kubernetes cluster. This could occur when the vCenter components are slow, storage is slow, vCenter service restarts, connectivity issues between vCenter and ESXi hosts etc. Since these orphan volumes occupy space in the datastore and are not really used in Kubernetes, it's useful to identify and cleanup orphan volumes periodically. + +This functionality provides a set of APIs to detect and delete orphan volumes on-demand, and also provide an option to turn on automatic deletion periodically. + +### What qualifies as an orphan volume ? +A volume qualifies as an orphan volume if it meets all of the below conditions: + +1) A PersistentVolume in any of the registered kubernetes clusters is not using the volume. +2) The volume was dynamically provisioned using vSphere CSI driver. +3) The volume is no longer classified as a container volume in vCenter. +4) Volume exists for more than 50 minutes(or `orphan-volume-detection-interval-mins` value configured in `cnsmanager-config` configmap) at the time of orphan volumes detection. +[**Note:** The worst case time for a volume to be detected as an orphan after its creation is twice `orphan-volume-detection-interval-mins` (i.e. 100 minutes by default).] + +### Which orphan volumes are skipped from detection/deletion ? +Some volumes, even if a kubernetes PersistentVolume doesn't map to them, will not be considered during orphan volume detection/deletion. These include: +* Volumes created out of band (not using vSphere CSI driver). +* Statically provisioned CNS volume whose name doesn't start with `pvc-`. +* File volumes. +* Orphan volumes that have snapshots will be detected as orphans but they can not be deleted using Orphan volume delete API. + +### A reminder to register all Kubernetes clusters! +Before you start using orphan volume functionality, it's imperative that you [register all the Kubernetes clusters](../../../README.md#register-kubernetes-clusters-before-you-start) in vCenter with CNS Manager, so that orphan volumes are detected correctly. Any newly added kubernetes cluster should also be immediately registered. + +### APIs provided +1. *GET /orphanvolumes* + +This API takes optional parameters, datacenter & list of datastores, and returns orphan volumes for them. +- If datacenter is not specified, then it returns all orphan volumes in the vCenter (all datastores on all datacenters ). +- If only datacenter is specified, then it returns orphan volumes in all datastores in the datacenter. +- If both datacenter & list of datastores is specified, it returns orphan volumes in specified datastores on the datacenter. + +Detection of orphan volumes can be a time-consuming operation if there are large number of orphans. Hence it is performed asynchronously at regular intervals and the response is cached. This API returns list of orphan volumes computed in the last run, along with the next operation interval(`RetryAfterMinutes`). +**Note:** For newly deployed CNS manager application when orphan volumes are being computed in the background for the first time, the API may return no orphan volumes. It should then be re-tried after `RetryAfterMinutes` to get orphan volumes computed in the latest run. + + +2. *DELETE /orphanvolumes* + +This API is used to delete orphan volumes. It also takes optional parameters, datacenter & list of datastores, and deletes orphan volumes from them. +You can also specify whether you want to delete orphan volumes attached to a virtual machine or not. If set to `true`, the API will detach the orphan volume from the VM before deleting it. + +Please note if there are large number of orphan volumes in the system or if there's a slowness in vCenter networking/storage, the orphan deletion may take longer. If it takes longer than 30 minutes, the API client will timeout. But be assured that orphan volumes are being deleted in the background which can also be verified by listing the orphans again using `GET /orphanvolumes` API. + + +### Setting up auto-monitoring for orphan volumes deletion +There's also an option to automatically monitor and delete orphan volumes periodically. It's controlled using `auto-delete-ov` configuration in `cnsmanager-config` configmap. It can take one of the 3 values: + a. `disable`: Orphan volumes will not be deleted automatically. + b. `enable-only-detached-ov`: Delete only detached orphan volumes. + c. `enable`: Delete all the detected orphan volumes(both attached & detached). + +By default, it is disabled. \ No newline at end of file