-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-15816][k8s] Prevent labels using kubernetes.cluster-id to exceed the limit of 63 characters #17819
base: master
Are you sure you want to change the base?
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 7f0824c (Wed Nov 17 10:33:28 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, @alpreu. 👍 There's only once concern, I want to share (related to the comment @tillrohrmann shared in the corresponding ticket): Shouldn't we make this threshold dynamic? It's hard to grasp why we set it to 40.
Instead, we could replace AbstractKubernetesParameters.getClusterId()
by something like AbstractKubernetesParameters.generateBasedOnClusterId(String)
and do the check internally. That way, we could fail if the generated String is longer than 63 characters. The clusterId
can be hidden in AbstractKubernetesParameters
as a private field (of course, we're not preventing cases where somebody access the configuration directly. But that's not covered in the current code, either.
The flaw of that approach would be that we wouldn't be able to set a specific value in the cluster-id
parameter description. A way to fix that could be to move the entire label generation into AbstractKubernetesParameters
. That way, the class would have knowledge about all available suffixes and could derive the maximum length of cluster ID from them. This would lower the risk of becoming inconsistent again when introducing additional k8s suffixes. WDYT?
@wangyang0918 is the case of adding new k8s prefixes unlikely?
@@ -220,7 +220,7 @@ | |||
.withDescription( | |||
Description.builder() | |||
.text( | |||
"The cluster-id, which should be no more than 45 characters, is used for identifying a unique Flink cluster. " | |||
"The cluster-id, which should be no more than 40 characters, is used for identifying a unique Flink cluster. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything preventing us from using Constants.MAXIMUM_CHARACTERS_OF_CLUSTER_ID
here? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alpreu I'm just puzzled. Why did we change the limit to 40
initially? The current version of the PR sets the limit to 45
again? I double-checked the code that we didn't miss any other label and couldn't find anything else. But it looks like I'm missing something. I remember that the initial change was valid in terms of the number but we just wanted to make it dynamic. Do you see what I'm missing here, @alpreu ? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it changed back to the original value. The reason is that the 40 was derived from the "-resourcemanager-leader"
in KubernetesHaServices
. In that class only configmap names are generated, for which the 63 character limit does not apply. I don't know if it makes sense to move this generation also into KubernetesLabel
. Especially because the JobManager one does contain the JobID
which would restrict the clusterId length extremely.
IIRC, not only the labels, but also pod names, services names, have the most 63 characters limitation. So we might need to have
Yes, we might introduce new or change some labels, names in the future, which will make the length pre-check for cluster-id not work again. So I agree with you that moving all the generation of labels, pod names, service |
Why is it not that easy? As far as I can see, |
@XComp Hmm. You are right. It is not too complicated and we just need to take some time to find out the K8s resource names generation(e.g. |
@alpreu would that be something you could do in this PR? |
Sure, I can do this, just have to deprioritize it for some SDK stuff for now |
I was looking into this again and came up with the idea of using an Enum to store the prefixes and suffixes. The |
What's the benefit of having an enum? Initially, I thought of just implementing factory methods in |
But then once again you have to touch the maximum length every time a parameter is added in the future, that was what I wanted to avoid |
not if you iterate over the factory methods to determine the maximum length. But yeah, you are right. You have to have some collection (in your case an enum) to be able to iterate over the generated labels to determine the maximum length. In this sense, an enum would be a viable solution. 👍 |
It is a good idea to introduce such a |
I just pushed a work in progress commit that illustrates the idea.
I tried to change that but almost all of the classes extending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alpreu . The changes look good. I have some minor suggestions that might improve the code a bit more.
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabelAffix.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/kubeclient/parameters/AbstractKubernetesParameters.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/kubeclient/parameters/AbstractKubernetesParameters.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/kubeclient/parameters/AbstractKubernetesParameters.java
Outdated
Show resolved
Hide resolved
317d5e9
to
fb128de
Compare
I updated the PR, PTAL :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alpreu. I guess we're almost there. Good work. There are still a few minor things that need to be fixed. Additionally, I added some nitpicking comments which might be worth considering as well.
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabel.java
Outdated
Show resolved
Hide resolved
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/Constants.java
Outdated
Show resolved
Hide resolved
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabel.java
Outdated
Show resolved
Hide resolved
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabel.java
Outdated
Show resolved
Hide resolved
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/utils/KubernetesLabel.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/flink/kubernetes/kubeclient/decorators/KerberosMountDecorator.java
Outdated
Show resolved
Hide resolved
...ernetes/src/main/java/org/apache/flink/kubernetes/configuration/KubernetesConfigOptions.java
Outdated
Show resolved
Hide resolved
…ed the limit of 63 characters
@flinkbot run azure |
1 similar comment
@flinkbot run azure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, looks good from my end now. Thanks for the changes.
@wangyang0918 can you have a second pass over it to verify that we're incorporating the correct Strings as KubernetesLabel
which fall under the restriction of having a maximum length of 63 characters?
I'm gonna merge the PR after the 👍 from @wangyang0918 and AzureCI
Really thanks @alpreu for working on this PR and @XComp helping with review. I believe it is the right direction to use the Another thing is about the pod name. The names of JobManager pod(e.g. |
You are right about the configmap-affixes, we could remove them from the Enum. I think the question is if we want to have the generation in one place or not.
As far as I could see both of those only relate to configmaps or podnames, which should support a maximum of 253 characters. So if we discount the -taskmanger- (13 chars) and the current maximum for the clusterId (40chars) we still have 200 characters left so it should not be an issue. What do you think? |
Having the generation in one place make more senses to me if we could get this done.
Yes, my bad. The pod names also do not have such limitation of 63 characters, but 253 characters. We do not need to add the check for them. |
Hi @wangyang0918 could you clarify once more please what code changes would like to see for the PR to be merged? I think it would be great if we can still get this into the 1.15 release |
@alpreu In current implementation, the ConfigMap name could not be logger than 63 characters. This is my only concern about this PR. Before this change, I think we do not have such assumption. This also make the max length of cluster-id smaller. I think we could have the names/labels generation in one place but without always check the 63 characters limit. Right? |
Another thing is that maybe we could also add the pre-condition check for |
I checked again and I am not sure we are on the right track to the root cause. The enum has these values currently:
If we remove all the ones with CONFIG_MAP in their name we are left with:
The |
@alpreu Currently, the 45 characters pre-check could only work for native K8s HA/non-HA mode. But for standalone cluster with K8s HA enabled, we do not have the pre-check and will cause the exception in FLINK-15816 if the cluster-id is longer than 63 characters. I have verified that cluster id with 45 characters works well.
|
Have we come up with a conclusion on this PR? I don't have the overview over what names are actually used as labels which fall under the 63 characters limit? @wangyang0918 do I understand you correctly that we have to integrate the |
Exactly. Maybe I need to make myself clearer.
WDYT? |
Ok, then I misinterpreted the ticket. Essentially, the issue is not that we have to guess the right value for this configuration parameter but that the validation never happens for standalone clusters with HA enabled. But I don't understand you claim that it make sense to have the REST suffix included in Why do we have such an odd value (i.e. 45) for Are there references to where the labels (i.e. the resources limited to 63 characters) are set? |
Just like the rest service name( The reason why set the
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set |
Ok, my understanding was that we're using service names in labels which forces us to set some limitation on the cluster ID. Reiterating over the comments of FLINK-15816, I conclude that it's really only about adding the check for the I'm just wondering why we should limit ourselves with 45 characters. Don't we have to restart the Flink cluster anyway in case of a version update? That would give the user the possibility to update the cluster ID if it becomes necessary. In any case, we should add some in-code documentation about this to the MAXIMUM_CHARACTERS_OF_CLUSTER_ID declaration in Constants:86 to explain the reason for this constant being set in that way (i.e. that it's left at Sorry for guiding you into the wrong direction, @alpreu . |
What is the purpose of the change
Kubernetes labels have a maximum length of 63 characters. We generate a variety of labels by applying suffixes and prefixes to represent certain components. Previously they have often exceeded the maximum length because the label generation happened by just concatenating strings together. This PR aims to provide a safe mechanism for the label creation that also keeps the documentation up to date.
Brief change log
Verifying this change
This change is already covered by existing tests, such as (please describe tests).
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation