Skip to content

Add draft ADR on S3 object definition #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 2, 2022
Merged

Add draft ADR on S3 object definition #177

merged 7 commits into from
May 2, 2022

Conversation

soenkeliebau
Copy link
Member

Adds ADR on how we want to represent S3 buckets in our CRD structure.

@soenkeliebau soenkeliebau requested a review from a team April 13, 2022 21:05
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sönke, thanks for writing this down, looks very good!

One important question: Are S3Bucket and S3Connection objects namespaced?

The other thing remaining for me is to talk about the way to handle the credentials.
I think it is sufficient for the first revision to only support anonymus and access + secretkey.
I don't know if the secretClass is an overkill for this. On the other hand it would give us the following benefits

  1. Support stuff like automagically creating STS session tokens or some possible AWS concepts i don't know of
  2. [Only if the S3Connection has cluster scope]: Resolve the missmatch S3Connection cluster scope <-> namespaced Secret. The SecretClass has cluster scope.

Copy link
Member

@lfrancke lfrancke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add (for option 5) how generated names come into play and/or how we are compatible with ObjectBucketClaim (OBC)?

As far as I understand it we do would now need to have a controller that converts from the OBC ConfigMap & Secret to our own S3Bucket and/or S3Connection, is that correct?

@soenkeliebau
Copy link
Member Author

One important question: Are S3Bucket and S3Connection objects namespaced?

I think they should be namespaced, there is a likelihood that for example the S3Connection could refer to a minio that is running inside of a namespace for example.

@razvan
Copy link
Member

razvan commented Apr 14, 2022

IMO there are some open questions on the operator side of things.

How are multiple S3 connections used in product CRDs ? In addition to the connection details, operators need to know the purpose of each connection (what to do with them). Is there an implicit purpose based on the role/group that contains the connection definition, or are connection labeled/named with a "purpose" label.

Are S3 connections defined "globally" in the CRD and referenced from respective role/groups or are they inlined in the role/group configuration?

How do product CRDs refer to multiple S3 connections for the same "purpose" (ex. reading data) ?

Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de>
Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com>
Co-authored-by: Lars Francke <github@lars-francke.de>
@soenkeliebau
Copy link
Member Author

soenkeliebau commented Apr 14, 2022

All good questions, but I'd argue that some of these will need to be evaluated per operator. What this ADR is trying to solve is mostly the structure of shared S3 objects.

I'll try answering inline.

How are multiple S3 connections used in product CRDs ? In addition to the connection details, operators need to know the purpose of each connection (what to do with them). Is there an implicit purpose based on the role/group that contains the connection definition, or are connection labeled/named with a "purpose" label.

Operator CRDs should choose appropriate naming for field referencing S3Buckets so that the user knows what is meant.

For example a fictional product that simply offers backup of data from one S3 bucket into a second S3 bucket the CRD could look like this:

---
apiVersion: s3backupinator.stackable.tech/v1alpha1
kind: S3BackupinatorCluster
metadata:
  name: simple-s3backupinator
spec:
  version: "2.57.14"
  sourceBucket: simple-bucket
  targetBucket: simple-bucket-backup
---
apiVersion: v1
kind: S3Bucket
metadata:
  name: simple-bucket
data:
  bucketName: myveryimportantdata
  s3Connection:
    inline:
      host: test-minio
      port: 9000
      secretClass: minio-credentials
      tls:
        verification:
          server:
            caCert:
              secretClass: my-s3-ca
---
apiVersion: v1
kind: S3Bucket
metadata:
  name: simple-bucket-backup
data:
  bucketName: myveryimportantdata
  s3Connection:
    inline:
      host: backup-minio
      port: 9023
      secretClass: minio-credentials
      tls:
        verification:
          server:
            caCert:
              secretClass: my-s3-ca

Are S3 connections defined "globally" in the CRD and referenced from respective role/groups or are they inlined in the role/group configuration?

I would expect this to normally be a global setting. At least I cannot currently come up with a scenario in which it would make sense to refer to different buckets from different roles/groups.

But this is something that we'll need to decide per operator.

How do product CRDs refer to multiple S3 connections for the same "purpose" (ex. reading data) ?

Sorry to keep giving the same answer, but this will again be operator specific.
But in principle I'd say that should just be a list of S4Bucket objects:

---
apiVersion: s3querynator.stackable.tech/v1alpha1
kind: S3QuerynatorCluster
metadata:
  name: simple-s3querynator
spec:
  version: "1.13.11"
  queryableBuckets:
    - simple-bucket
    - simple-bucket2
---
apiVersion: v1
kind: S3Bucket
metadata:
  name: simple-bucket
data:
  bucketName: myveryimportantdata
  s3Connection:
    inline:
      host: test-minio
      port: 9000
      secretClass: minio-credentials
      tls:
        verification:
          server:
            caCert:
              secretClass: my-s3-ca
---
apiVersion: v1
kind: S3Bucket
metadata:
  name: simple-bucket2
data:
  bucketName: mynotsoimportantdata
  s3Connection:
    inline:
      host: aws-s3-endpoint
      port: 9000
      secretClass: aws-credentials
      tls:
        verification:
          server:
            caCert:
              secretClass: my-s3-ca

@soenkeliebau
Copy link
Member Author

Can you maybe add (for option 5) how generated names come into play and/or how we are compatible with ObjectBucketClaim (OBC)?

As far as I understand it we do would now need to have a controller that converts from the OBC ConfigMap & Secret to our own S3Bucket and/or S3Connection, is that correct?

That is correct. I intentionally left that out of this ADR, as it is not really related to the options outlined here, but instead an unrelated decision to take. Whatever we decide here will have to be implemented by that operator.

I have added some more detail around this to the ADR.

lfrancke
lfrancke previously approved these changes Apr 22, 2022
sbernauer
sbernauer previously approved these changes Apr 25, 2022
Copy link
Contributor

@fhennig fhennig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks for writing it all down!

@soenkeliebau soenkeliebau dismissed stale reviews from sbernauer and lfrancke via 68782a0 April 29, 2022 11:40
@soenkeliebau soenkeliebau requested a review from fhennig April 29, 2022 11:42
sbernauer
sbernauer previously approved these changes Apr 29, 2022
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@soenkeliebau
Copy link
Member Author

Thanks!

Sorry, addressed one more comment that I had missed.

@soenkeliebau soenkeliebau requested a review from sbernauer May 2, 2022 06:58
@soenkeliebau soenkeliebau merged commit 3dac1b0 into main May 2, 2022
@soenkeliebau soenkeliebau deleted the adr/s3 branch May 2, 2022 08:41
@lfrancke lfrancke added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants