Skip to content

Conversation

@JohnStrunk
Copy link
Member

Describe what this PR does
This PR adds a design for StorageClass parameter fields.

Is there anything that requires special attention?
My goal for this PR is to agree on where we want to end up with the fields in the StorageClass. Since the StorageClass object is the main interface for cluster admins to define the types of storage that should be provided to users, the parameters that we choose to expose and how we do it will significantly affect the admins UX.

Related issues:
none.

Signed-off-by: John Strunk <jstrunk@redhat.com>
@JohnStrunk JohnStrunk requested a review from humblec September 11, 2018 23:10
@centos-ci
Copy link
Collaborator

Can one of the admins verify this patch?

@humblec
Copy link
Contributor

humblec commented Sep 12, 2018

Thanks @JohnStrunk for this : At the very minimum I would like to pass
glusterURL, glusterUser, glusterSecret via SC , definitely we can many or more and obviously in plan. I can capture others later.

@humblec
Copy link
Contributor

humblec commented Sep 12, 2018

I can work on this code implementation. Assigning to myself.
/assign @humblec

@JohnStrunk
Copy link
Member Author

@humblec I'm removing you from the PR so you can pick up the related issues for implementation (which is what I think you meant)... Probably #46

@JohnStrunk
Copy link
Member Author

@centos-ci ok to test

Copy link
Member

@amarts amarts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

From glusterfs, we mainly need 'volume type', and 'volume options', and as the example covers both these, it should be good to get in for now!

@JohnStrunk JohnStrunk removed this from the GCS-alpha1 milestone Oct 3, 2018
@JohnStrunk
Copy link
Member Author

From @humblec 's feedback, I have rechecked the allowed syntax, and my proposed parameters format isn't going to work as the field must be a plain map<string,string>. I'll revise this to flatten the fields.

Copy link
Contributor

@jarrpa jarrpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few questions and comments, with more waiting until the layout gets flattened.

A general comment: I see that there are a number of parameters from the current (in-tree) provisioner that are not represented in this proposal. I haven't compared them with scrutiny, but have they been consciously omitted, possibly due to no longer being relevant?

parameters:
# Maximum size of an individual brick. Volumes that exceed this will be
# created as distributed-* volumes.
maxBrickSize: 100Gi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually a feature of Gluster I'm not aware of? Or is this logic yet-to-be-implemented in the CSI driver?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will interact w/ GD2 IVP to limit brick size to make recovery and capacity balancing easier. I thought Heketi had something similar, though not quite used like intended here... Perhaps I'm just thinking of vol resize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Might be something that could be enforced within the driver, for Heketi anyway.

# PV capacity * capacityReservation will be reserved for use by this
# volume. The reservation should account for snaps, clones, compression,
# and deduplication. Float >= 0
capacityReservation: "1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name does not really reflect its function. Since we're explicitly using LVM, what about something like thinpoolFactor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to make the function reflect the name. 😁
The idea really is to "reserve" this brick capacity and not allocate it for other volumes. This is more than just setting the thinp size. It needs to account for VDO as well.
My big struggle here is that it's pretty straightforward for snapshots, but I'm not sure what to do about the "factor" when cloning. If I have an initial volume w/ reservation=10x and I clone it, presumably, the 10x accounted for the clone... What should the reservation parameter of the clone be? And does the answer change if I'm cloning a single image multiple times vs. cloning a clone of a clone of a...?

Due to the uncertainty here, I'm tempted to drop it for now, but w/o some sort of reservation, clone and snap are pretty useless.

Side note: We're also evaluating removal of LVM due to performance & scale issues.

@JohnStrunk
Copy link
Member Author

A general comment: I see that there are a number of parameters from the current (in-tree) provisioner that are not represented in this proposal. I haven't compared them with scrutiny, but have they been consciously omitted, possibly due to no longer being relevant?

I was working from: https://kubernetes.io/docs/concepts/storage/storage-classes/#glusterfs

  • No longer needed: resturl, restauthenabled, restuser, restuserkey, secretName, secretNamespace, clusterid. These all move to the CSI driver configuration.
  • volumetype is one I'm trying to split out to make more extensible, less cryptic, and allow us to bring in arbiter, thin arbiter, halo
  • That leaves gidMin & gidMax which I haven't had time to track down. I find it odd that we're the only driver with those. If anyone has links that discuss these, their importance, and the alternatives, I'd appreciate it.

@jarrpa
Copy link
Contributor

jarrpa commented Oct 25, 2018

A general comment: I see that there are a number of parameters from the current (in-tree) provisioner that are not represented in this proposal. I haven't compared them with scrutiny, but have they been consciously omitted, possibly due to no longer being relevant?

I was working from: https://kubernetes.io/docs/concepts/storage/storage-classes/#glusterfs
* No longer needed: resturl, restauthenabled, restuser, restuserkey, secretName, secretNamespace, clusterid. These all move to the CSI driver configuration.

This will not be true if we plan on ever supporting more than one Gluster cluster per CSI driver. As we currently promote deploying at least two Gluster clusters in a "full" OpenShift installation, I would think it'd be ridiculous to have two copies of the same driver running.

* volumetype is one I'm trying to split out to make more extensible, less cryptic, and allow us to bring in arbiter, thin arbiter, halo

That would definitely be appreciated!

* That leaves gidMin & gidMax which I haven't had time to track down. I find it odd that we're the only driver with those. If anyone has links that discuss these, their importance, and the alternatives, I'd appreciate it.

I've always been confused by this, as well. I'd also like to know more about why this is.

@JohnStrunk
Copy link
Member Author

This will not be true if we plan on ever supporting more than one Gluster cluster per CSI driver. As we currently promote deploying at least two Gluster clusters in a "full" OpenShift installation, I would think it'd be ridiculous to have two copies of the same driver running.

I don't have any plans to support a single driver accessing multiple gluster clusters. The reasons are:

  • The version of the CSI driver must be coordinated w/ the server versions, and adding multiple clusters to the mix complicates upgrade sequencing
  • The operator will be deploying the CSI driver (or triggering it if we use the CSI operator). While we could put in checks to avoid multiple deployments and skip removal if there are remaining clusters, this seems like added complexity (for dev and test) over having a single set of procedures that work the same way every time.
  • By moving this contact & authentication config to the CSI config (set up via the operator), we avoid burdening the admin with it and avoid a potential source of misconfiguration that would have to be tracked down. It just works out of the box, properly authenticated and secured.
  • The resources consumed by the CSI driver are small. It's a rather minimal golang executable w/ low CPU and memory requirements. The gains in idiot-poofing seem worth a couple extra pods.

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 26, 2018

as this needs some more discussion, moving out of GCS/0.2

@Madhu-1 Madhu-1 removed the GCS/0.2 label Oct 26, 2018
@jarrpa
Copy link
Contributor

jarrpa commented Oct 26, 2018

I don't have any plans to support a single driver accessing multiple gluster clusters. The reasons are:

* The version of the CSI driver must be coordinated w/ the server versions, and adding multiple clusters to the mix complicates upgrade sequencing

The only coordination is FUSE client version >= Server version. The Operator already controls both, such that all Gluster clusters should have the same version / container images. On Operator update, if it comes with a new default image for the server containers, we already have plans to disallow further Operator updates until the full Gluster system reaches quiescence.

* The operator will be deploying the CSI driver (or triggering it if we use the CSI operator). While we could put in checks to avoid multiple deployments and skip removal if there are remaining clusters, this seems like added complexity (for dev and test) over having a single set of procedures that work the same way every time.

Complexity worth the feature, in my opinion. :)

* By moving this contact & authentication config to the CSI config (set up via the operator), we avoid burdening the admin with it and avoid a potential source of misconfiguration that would have to be tracked down. It just works out of the box, properly authenticated and secured.

We could still move that to the Operator by having it take care of configuring the StorageClasses with the appropriate credentials automatically.

* The resources consumed by the CSI driver are small. It's a rather minimal golang executable w/ low CPU and memory requirements. The gains in idiot-poofing seem worth a couple extra pods.

While the CPU and memory requirements are small, we do still have to consider that the pods themselves count towards the maximum number of pods per node. This means (with the current example deployments) that for every Gluster cluster you have at least two controller pods per Gluster cluster in the Kube/OCP cluster and at lease one node pod on (typically) EVERY schedulable node in the Kube/OCP cluster.

Signed-off-by: John Strunk <jstrunk@redhat.com>
@jarrpa
Copy link
Contributor

jarrpa commented Oct 29, 2018

The current PR LGTM. I still have my concerns about continuing to rely on a single cluster per driver model, but that can be addressed in another PR.

@JohnStrunk
Copy link
Member Author

Any other reviews?

@humblec
Copy link
Contributor

humblec commented Nov 2, 2018

@JohnStrunk as in comment #30 (comment), I agree with @jarrpa . IMO, we should have support for multiple cluster for a single CSI driver. On that regard, I would like to have glusterCluster param in SC. I still miss, whats wrong if we have this param as an optional one. By this, admin has the possibility of bringing up different SCs for different Clusters as well. Say one for test and another one for Dev using same CSI driver. We cant always think, operator is the ONLY or a must thing for CSI driver, gluster CSI driver should be capable of running it alone and flexible in its support. Driver init() is always required if we allow this param only via deployment ENV var.

@obnoxxx thoughts?

@humblec
Copy link
Contributor

humblec commented Nov 2, 2018

@JohnStrunk one other high level question, how are you planning to match these allowed SC params to GD2 volumecreate fields ?

@JohnStrunk
Copy link
Member Author

We can have optional fields for specifying cluster details, but that's not how GCS is going to be deployed. If someone wants to do it all manually (e.g., deploy CSI w/o the rest of the stack), then sure, specifying via SC is fine. Upgrade, versioning, and authentication is on them.

@JohnStrunk one other high level question, how are you planning to match these allowed SC params to GD2 volumecreate fields ?

Likely, some of these options cannot currently be expressed to GD2 IVP. Implementation here will need to be coordinated w/ the GD2 team. I assume you're referring to things such as maxBrickSize, capacityReservation, and the zone fields. IVP will need to be enhanced to implement these options.

@humblec
Copy link
Contributor

humblec commented Nov 7, 2018

Likely, some of these options cannot currently be expressed to GD2 IVP. Implementation here will need to be coordinated w/ the GD2 team. I assume you're referring to things such as maxBrickSize, capacityReservation, and the zone fields. IVP will need to be enhanced to implement these options.

@JohnStrunk Most of the param has issues in co-ordination with GD2 api. For example, if we parse arbitertype as in this design and in this code
humblec@aa1b771

How to pass arbitertype to GD2 ?

@JohnStrunk
Copy link
Member Author

It appears thin arbiter is designated via a flag. See: gluster/glusterd2#702
If IVP doesn't expose a sufficient interface, please open an issue against gd2.

@JohnStrunk
Copy link
Member Author

In keeping w/ #154:

  • the language around arbiters needs to be updated such that arbiterZones only applies to normal arbiter volumes, not thin
  • arbiterPath option needs to be documented

@joejulian joejulian closed this Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants