Description
As a user I want to be able to make external resources and dependencies available to pods.
Previously this was implemented using PVCs, which was not independent of cluster capabilities (Access modes for PVCs etc.). A alternative solution is to dispense with PVCs and instead implement a dependencySources
that will use a complex enum to expose different sources. In this example below, this will be an S3Bucket
:-
---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Bucket
metadata:
name: my-bucket-resource1
spec:
bucketName: my-example-bucket1
connection:
reference: my-connection-resource
---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
...
spec:
...
driver:
...
dependencySources:
# Use "#[serde(flatten)]" in a complex enum called e.g. "Source"
# allowing s3Bucket (and maybe hdfsFolder or url later on)
- s3Bucket: # Option<S3BucketRef>
inline: [...]
# or
reference: my-bucket-resource1
When a dependencySources
is declared for a role, the contents of this source will be copied to an internally defined path (not visible to the user) on the respective pods, from where it can be accessed as required. In other words, where previously a job was started to copy resources to a (usually RWX) PVC - which was then used to back a volume mounted for roles - the same mechanism will be used without using shared storage. This adds some redundancy, but a) is independent of PV storage classes and b) is more transparent for the user.