Skip to content

ADR: Listener Operator #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Sep 12, 2022
Merged
Changes from 14 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
4a225e6
Added initial snippet
Aug 23, 2022
ddb99d4
More text
Aug 23, 2022
1c66f91
More text
Aug 23, 2022
505930d
More text
Aug 24, 2022
113439c
More text
Aug 24, 2022
1a7245b
fixed typos and formatting
Aug 25, 2022
c4b49da
Update modules/contributor/pages/adr/ADR000-WIP.adoc
fhennig Aug 31, 2022
a671c38
Update modules/contributor/pages/adr/ADR000-WIP.adoc
fhennig Aug 31, 2022
ae4f440
Added static config files problem
fhennig Aug 31, 2022
1800d2f
Added calico, ARP notes
fhennig Sep 1, 2022
399776d
Clarification
fhennig Sep 1, 2022
8a1ece9
Many updates
fhennig Sep 1, 2022
5f2c789
Many updates
fhennig Sep 1, 2022
eb60a79
Merge branch 'main' into lb-operator-adr
fhennig Sep 1, 2022
729a25b
Clarified how clients connect
fhennig Sep 5, 2022
e096191
Added note on the name
fhennig Sep 5, 2022
5ef9683
Added a more explicit notes on considered alternatives
fhennig Sep 5, 2022
14d0861
Clarification on 'single address'
fhennig Sep 5, 2022
9c69eee
Expanded context
fhennig Sep 5, 2022
34f8d66
Expanded context
fhennig Sep 5, 2022
62017af
Merge branch 'main' into lb-operator-adr
fhennig Sep 5, 2022
c0ae76a
Added authors etc.
fhennig Sep 7, 2022
e08f2fa
Merge remote-tracking branch 'refs/remotes/origin/lb-operator-adr' in…
fhennig Sep 7, 2022
4c3f98f
Update modules/contributor/pages/adr/ADR000-WIP.adoc
fhennig Sep 7, 2022
215bce8
Added CRD examples and something about node failure
fhennig Sep 7, 2022
1f323df
Added something on external IPs
fhennig Sep 7, 2022
5c4a97d
Added something about role LoadBalances
fhennig Sep 7, 2022
d1a7e8a
Renamed the file and added it to the menu aus ADR024
fhennig Sep 7, 2022
5f7d844
Update modules/contributor/pages/adr/ADR024-out-of-cluster_access.adoc
fhennig Sep 8, 2022
0af83dd
Update modules/contributor/pages/adr/ADR024-out-of-cluster_access.adoc
fhennig Sep 8, 2022
7d45d32
Update modules/contributor/pages/adr/ADR024-out-of-cluster_access.adoc
fhennig Sep 8, 2022
03f2b23
Update modules/contributor/pages/adr/ADR024-out-of-cluster_access.adoc
fhennig Sep 8, 2022
d9c7fc4
Some changes
fhennig Sep 8, 2022
85cacac
Update modules/contributor/pages/adr/ADR024-out-of-cluster_access.adoc
fhennig Sep 8, 2022
f0cdb20
Merge branch 'main' into lb-operator-adr
fhennig Sep 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions modules/contributor/pages/adr/ADR000-WIP.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
= How to provide stable out-of-cluster access to products
Felix Hennig <felix.hennig@stackable.tech>
v0.1, YYYY-MM-DD
:status: draft

* Status: {status}
* Deciders: [list everyone involved in the decision] <!-- optional -->
* Date: [YYYY-MM-DD when the decision was last updated] <!-- optional -->

Technical Story: [description | ticket/issue URL] <!-- optional -->

== Context and Problem Statement
// Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.

Eventually, the products we host in Kubernetes will need to be accessed from outside of the cluster. Our current solution for this is NodePort services. However, the IP and port can change, if a Pod is rescheduled to a different node, or if a ProductCluster is restarted.

Furthermore, some products like HDFS and Kafka don't use a single router or portal node to access the cluster, but have the client access multiple nodes. For example, HDFS name nodes will tell the client where data can be found (hostname/IP and port), the client is then expected to connect directly to a specific data node. Similarly for Kafka and topic shards.

Problems:

* **Unstable addresses** - Clients need stable addresses to connect to, but Kubernetes can move pods around. While the discovery ConfigMap is updated, it's not feasible to ask the client to pull the new info from there every time, clients will want to use static config files with static addresses to connect to.
* **Replicas not addressable** - In our current setup, there's no way to connect to a specific replica in a StatefulSet or Deployement - which is necessary for cases like the data nodes of HDFS.
* **Pods don't know their outside address** - The hostname and IP that the pods know about themselves is from _inside_ the cluster. The IP only works inside the overlay network. This means ProductCluster processes cannot link to other nodes of the cluster.

== Decision Drivers
// Which criteria are useful to evaluate solutions?

* At least for HDFS, connections to individual pods will be used to transmit data, this means that performance is relevant.
* On-prem customers will often not have any kind of network-level load balancing (at least not one that is configurable by K8s).
* Cloud customers will often have relatively short-lived K8s nodes.
* The solution should be minimally invasive - no large setups required outside of the cluster.

== Implemented Solution

A new resource is proposed: Listener. It is handled similarly to storage. There is are ListenerClasses for different types of Listeners - analogous to StorageClass. There are Listener objects - similar to PersistentVolumes. And claims to listeners are made in ProductCluster objects.

Under the hood a listener-operator runs as a CSI driver with a new `listener.stackable.tech` type. Listener claims in the ProductCluster resource are then converted by the product operator into PersistentVolumeClaims (PVCs) to the storage type. Listener settings are passed along as annotations to the PVC. Initially there will be two Listener types - `private` implemented with NodePorts; and `public` implemented with LoadBalancers. The listener-operator creates the Listeners according to the PVC settings and provides the listener info in the PV into the pods with the PVCs.

Communication flow example using the HDFS Operator:

* A HDFS cluster resource is created by the user, with a `private` listener setting.
* The HDFS Operator requests a PVC of the listener.stackable.tech type and an annotation to create a `private` listener.
* The listener-operator provisions a NodePort Service for the volume request, which means a Service per Pod. It reads the NodePort IP and port.
* The listener-operator provisions the volumes with files inside containing information about the pods outside address and port - The IP and port of the NodePort Service. Because of the PVC it knows which pod the volume will be mounted into, and can find out the NodePort that belongs to the pod.
* the HDFS operator already provisioned the pod with a script that read the files from the mounted volume into environment variables which are then read by HDFS. This part is product specific.

The way the product operator requests the volume is identical for all pods of a StatefulSet/Deployment: it always requests a volume with the type (i.e. `nodeport`) that was configured in the ProductCluster.

== Decision Outcome

There is only one design, which is already in its implementation.


Pros:

* There is little routing overhead (compared to proxying or similar).
* The listener-operator can be extended to support more ListenerClasses.
* It is a very low-friction solution that doesn't require a lot of permissions to set up.

Cons:

* Products like HDFS and Kafka only support having a single address. This means that if outside access with the lb operator is configured, all traffic will be routed that way.
* It is another DaemonSet Operator, which means more stuff that is running. It is also not clear how we will get this certified with OpenShift.

== Other notes

=== Spiked Alternatives: MetalLB, Calico
See: https://metallb.universe.tf/, https://www.tigera.io/project-calico/

MetalLB is a bare metal load balancer that was spiked briefly. However it requires BGP/ARP integration, which is not feasible as a requirement for customer installations. Calico requires BGP.

With ARP, the LoadBalancers appear as "real" IP addresses in the same subnet as the nodes (with no need to configure custom routing roules). However, this scales poorly (it assumes that all nodes are in the same L2 broadcast domain) and is relatively likely to be blocked by firewalls or network policy.