Skip to content

Adds functionality to specify custom catalog definitions for Trino. #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

soenkeliebau
Copy link
Member

Description

Trino requires catalog definitions to be defined in files, not dynamically via the api.
We currently do not support adding extra catalogs that are not provided by the Trino operator (which is only Hive at the moment).

This PR adds a 'customCatalogs' property which allows specifying arbitraty ConfigMaps to pull in for catalog definitions.

Review Checklist

  • Code contains useful comments
  • (Integration-)Test cases added (or not applicable)
  • Documentation added (or not applicable)
  • Changelog updated (or not applicable)
  • Cargo.toml only contains references to git tags (not specific commits or branches)
  • Helm chart can be installed and deployed operator works (or not applicable)

Once the review is done, comment bors r+ (or bors merge) to merge. Further information

@soenkeliebau soenkeliebau marked this pull request as draft March 8, 2022 08:06
@soenkeliebau
Copy link
Member Author

This PR was mostly to get Rob something to work with for his first iteration of the demo and enable adding catalogs to Trino at all.

Before merging this we should discuss how we want this to look in the CRD and how we might want to enable making it variable across environments.
For example deploying the same Trino object to two clusters with no changes, which in turn then connects to different databases (dev / test).

Proposal (doesn't meet what I wrote above yet):

---
apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
  name: simple-trino
spec:
  ...
  catalogDefinitions:
    - hive: 
        cluster-name: simple-hive
    - s3:
        endPoint: changeme
        accessKey: changeme
        secretKey: changeme
        sslEnabled: false
        pathStyleAccess: true
     - custom:
         configMapName: custom-1
     - custom:
         configMapName: custom-2
  ...

@sbernauer
Copy link
Member

I'm not sure whats the current state here but I'm just throwing in the possibility of making a Catalog CRD. Especially since they Catalogs are quite complicated and e.g. a Hive or Iceberg catalog need a S3Connection (see discussion yesterday stackabletech/documentation#177) or an kerberos auth method (we may need secret operator), also a hdfs-site.xml. I think it's the same as with S3 and we need a CRD, ConfigMaps are not flexible enough.

Anyhow i would try to prevent the same communication thing happening as with S3 definition and propose an official ADR for this :)

@soenkeliebau
Copy link
Member Author

We have agreed that this shouldn't be part of the TrinoCluster CRD itself. There is no currently active design effort around this I believe though ..

@sbernauer
Copy link
Member

@sbernauer
Copy link
Member

@soenkeliebau can we close this in favor of WIP #209?

@soenkeliebau
Copy link
Member Author

Absolutely!

@fhennig fhennig deleted the feat/extra_connectors branch January 24, 2023 12:06
@fhennig fhennig restored the feat/extra_connectors branch January 24, 2023 12:07
@razvan razvan deleted the feat/extra_connectors branch November 4, 2024 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants