Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spawning fails for user names starting with non-alphanumeric chars #498

Closed
yuvipanda opened this issue Mar 19, 2021 · 11 comments · Fixed by #744
Closed

Spawning fails for user names starting with non-alphanumeric chars #498

yuvipanda opened this issue Mar 19, 2021 · 11 comments · Fixed by #744
Labels

Comments

@yuvipanda
Copy link
Collaborator

yuvipanda commented Mar 19, 2021

Bug description

When a user with a username that starts with a non alphanumeric character (like '-' or '_') tries to spawn, it fails with:

{
    "kind":"Status",
    "apiVersion":"v1",
    "metadata":{
        
    },
    "status":"Failure",
    "message":"Pod \"jupyter--5fusername\" is invalid: metadata.labels: Invalid value: \"-5fusername\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')",
    "reason":"Invalid",
    "details":{
        "name":"jupyter--5fusername",
        "kind":"Pod",
        "causes":[
            {
                "reason":"FieldValueInvalid",
                "message":"Invalid value: \"-5fusername\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')",
                "field":"metadata.labels"
            }
        ]
    },
    "code":422
}

Expected behaviour

A pod starts for the user

Actual behaviour

A user server spawns a pod with a label for the user's name. Label values can't start with non-alphanumeric chars, so this fails.

How to reproduce

Setup simple kubespawner instance and try logging in with a username starting with '_'

@yuvipanda
Copy link
Collaborator Author

I've dealt with this temporarily by removing the username label. I see the three options as:

  1. Remove label completely
  2. Add a prefix (like we do for pod name)
  3. Make the label opt-in

We add a username annotation, which doesn't have this restriction.

Most users / admins won't have a strong need to target by username label, so my preference is to do (3)

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/advanced-z2jh-deeply-customizing-the-spawner/8432/1

@consideRatio
Copy link
Member

I lean in favor of option 3 as well.

@djangoliv
Copy link

The same problem happen with @ in username.
example with a username like: user@factory.org

@yuvipanda
Copy link
Collaborator Author

@djangoliv this shouldn't be happening for labels starting with @ as they should be escaped by escapism. Are you still seeing this issue?

@minrk
Copy link
Member

minrk commented Jun 6, 2023

I agree that annotations make more sense. What's the case for usernames in labels? Metrics, mostly (e.g. aggregating cpu usage by user)?

@yuvipanda
Copy link
Collaborator Author

@minrk yep, metrics.

@minrk
Copy link
Member

minrk commented Jun 6, 2023

FWIW, I was just today working on solving the ~same problem for user subdomains in JupyterHub in jupyterhub/jupyterhub#4471 and have a scheme that I think always produces a unique safe string (DNS label in my case, similar rules here) from any arbitrary string:

  1. implement trim-and-hash scheme that always produces a unique, safe string, according to rules (I chose u-{filter(lowercase_chars+digits)}--{hash})
  2. check for escape signifier (-- in my case), fallback on trim-and-hash if found to avoid collisions with escaped results
  3. if string is valid, use it as-is
  4. (optional) try a less-disruptive escape (IDNA in my case); if valid, use it
  5. fallback on trim-and-hash

This satisfies my goals:

  1. always produces a valid result
  2. uses the 'nice' value where it works
  3. guarantees no collisions between 'already-escaped' nice values and escape results (bonus: without introducing the -2d problem, necessary for escapism itself to avoid collisions)

Of course, you can just always use the trim-and-hash scheme, if you don't mind the hashes on the end of the labels (k8s users are used to meaningless junk on the ends of most strings). Then you don't need the multi-stage escape. But if you like having the simple strings in the 90% of cases where they are valid, then this approach ought to work, and be more robust than using escapism which doesn't address things like start/end character or length requirements.

@minrk
Copy link
Member

minrk commented Jun 6, 2023

btw, to be explicit - I support option 3 as well. The only issue is that users are still affected by this bug, it's just a subset of users who opt-in to the known-to-work-only-sometimes feature. So the bug doesn't go away, it just comes up less often.

@consideRatio
Copy link
Member

Note that this is the challenge, our escaping isn't enough because it doesn't give special treatment to the criteria of not starting/ending with a alphanum.

apiVersion: v1
kind: ConfigMap
metadata:
  name: label-demo
  labels:
    environment: -production
data:
  test: hi
kubectl apply -f test.yaml
The ConfigMap "label-demo" is invalid: metadata.labels: Invalid value: "-production": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')

@minrk
Copy link
Member

minrk commented Jun 6, 2023

Right. All the rules must be taken into account, including start, end, and length, not just body characters. The trouble with escaping alone is that it doesn't handle length or boundary conditions. And the challenge with multiple schemes is avoiding collisions.

We can do roughly:

def best_safe_slug(name):
    if could_collide(name):
        # if the input matches a pattern that could be produced by either scheme, make sure it gets escaped again
        return always_safe(name)
    nice_name = nice_scheme(name) # guaranteed unique, not guaranteed to be valid, e.g. the name itself
    if is_valid(nice_name):
        return nice_name
    return always_safe(name)

This scheme can be used to produce all of our restricted strings (resource names, labels, etc.) with different functions for is_valid, even after templating, and guarantee we always produce a valid, unique value. It's mainly a question of when things like truncate & hash are used (i.e. always or only when needed), because we can't avoid it if we want to accept every valid input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants