-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add 'safe' slug scheme #744
Merged
Merged
Changes from 20 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
7bac57c
add 'safe' slug scheme
minrk d6be9f5
Merge branch 'main' into wip-slug
minrk bdf0f37
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a1b524d
toward compatible transition for safe slugs
minrk c0b79af
Merge from main
minrk 058afd5
revert use_legacy_labels config
minrk 6968b68
let safe slug scheme live side-by-side with escape scheme
minrk 45299ce
add multi_slug mechanism for multi-word slugs (username--servername--…
minrk e24b1f6
sub '-' for any sequence of unsafe characters
minrk cbd983b
restore trailing hyphen logic
minrk 15ead3c
track kubespawner version in state, annotations
minrk fc4f628
allow opting out of persisted pvc name with remember_pvc_name = False
minrk fd4135f
document new template scheme and upgrade notes
minrk 3ece569
update some test expectations
minrk f96ab38
exercise pvc_name upgrade cases
minrk 09ade1c
Fix markdown table formatting
consideRatio 10ba1fb
Document escaped_username and escaped_servername to be added in v7
consideRatio d4c2308
clearer comment about values being loaded by get_state
minrk f44b178
Merge branch 'main' into wip-slug
minrk 97399f0
remove hardcoded safe slug scheme from namespace
minrk 6232c4b
only handle legacy pvc name when remember_pvc_name is true
minrk 845f3d8
Sync with main
minrk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
(templates)= | ||
|
||
# Templated fields | ||
|
||
Several fields in KubeSpawner can be resolved as string templates, | ||
so each user server can get distinct values from the same configuration. | ||
|
||
String templates use the Python formatting convention of `f"{fieldname}"`, | ||
so for example the default `pod_name_template` of `"jupyter-{user_server}"` will produce: | ||
|
||
| username | server name | pod name | | ||
| ---------------- | ----------- | ---------------------------------------------- | | ||
| `user` | `''` | `jupyter-user` | | ||
| `user` | `server` | `jupyter-user--server` | | ||
| `user@email.com` | `Some Name` | `jupyter-user-email-com--some-name---0c1fe94b` | | ||
|
||
## templated properties | ||
|
||
Some common templated fields: | ||
|
||
- [pod_name_template](#KubeSpawner.pod_name_template) | ||
- [pvc_name_template](#KubeSpawner.pvc_name_template) | ||
- [working_dir](#KubeSpawner.working_dir) | ||
|
||
## fields | ||
|
||
The following fields are available in templates: | ||
|
||
| field | description | | ||
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| `{username}` | the username passed through the configured slug scheme | | ||
| `{servername}` | the name of the server passed through the configured slug scheme (`''` for the user's default server) | | ||
| `{user_server}` | the username and servername together as a single slug. This should be used most places for a unique string for a given user's server (new in kubespawner 7). | | ||
| `{unescaped_username}` | the actual username without escaping (no guarantees about value, except as enforced by your Authenticator) | | ||
| `{unescaped_servername}` | the actual server name without escaping (no guarantees about value) | | ||
| `{pod_name}` | the resolved pod name, often a good choice if you need a starting point for other resources (new in kubespawner 7) | | ||
| `{pvc_name}` | the resolved PVC name (new in kubespawner 7) | | ||
| `{namespace}` | the kubernetes namespace of the server (new in kubespawner 7) | | ||
| `{hubnamespace}` | the kubernetes namespace of the Hub | | ||
|
||
Because there are two escaping schemes for `username`, `servername`, and `user_server`, you can explicitly select one or the other on a per-template-field basis with the prefix `safe_` or `escaped_`: | ||
|
||
| field | description | | ||
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| `{escaped_username}` | the username passed through the old 'escape' slug scheme (new in kubespawner 7) | | ||
| `{escaped_servername}` | the server name passed through the 'escape' slug scheme (new in kubespawner 7) | | ||
| `{escaped_user_server}` | the username and servername together as a single slug, identical to `"{escaped_username}--{escaped_servername}".rstrip("-")` (new in kubespawner 7) | | ||
| `{safe_username}` | the username passed through the 'safe' slug scheme (new in kubespawner 7) | | ||
| `{safe_servername}` | the server name passed through the 'safe' slug scheme (new in kubespawner 7) | | ||
| `{safe_user_server}` | the username and server name together as a 'safe' slug (new in kubespawner 7) | | ||
|
||
These may be useful during a transition upgrading a deployment from an earlier version of kubespawner. | ||
|
||
The value of the unprefixed `username`, etc. is goverend by the [](#KubeSpawner.slug_scheme) configuration, and always matches exactly one of these values. | ||
|
||
## Template tips | ||
|
||
In general, these guidelines should help you pick fields to use in your template strings: | ||
|
||
- use `{user_server}` when a string should be unique _per server_ (e.g. pod name) | ||
- use `{username}` when it should be unique per user, but shared across named servers (sometimes chosen for PVCs) | ||
- use `{escaped_}` prefix if you need to keep certain values unchanged in a deployment upgrading from kubespawner \< 7 | ||
- `{pod_name}` can be re-used anywhere you want to create more resources associated with a given pod, | ||
to avoid repeating yourself | ||
|
||
## Changing template configuration | ||
|
||
Changing configuration should not generally affect _running_ servers. | ||
However, when changing a property that may need to persist across user server restarts, special consideration may be required. | ||
For example, changing `pvc_name` or `working_dir` could result in disconnecting a user's server from data loaded in previous sessions. | ||
This may be your intention or not! KubeSpawner cannot know. | ||
|
||
`pvc_name` is handled specially, to avoid losing access to data. | ||
If `KubeSpawner.remember_pvc_name` is True, once a server has started, a server's PVC name cannot be changed by configuration. | ||
Any future launch will use the previous `pvc_name`, regardless of change in configuration. | ||
If you _want_ to change the names of mounted PVCs, you can set | ||
|
||
```python | ||
c.KubeSpawner.remember_pvc_name = False | ||
``` | ||
|
||
This handling isn't general for PVCs, only specifically the default `pvc_name`. | ||
If you have defined your own volumes, you need to handle changes to these yourself. | ||
|
||
## Upgrading from kubespawner \< 7 | ||
|
||
Prior to kubespawner 7, an escaping scheme was used that ensured values were _unique_, | ||
but did not always ensure fields were _valid_. | ||
In particular: | ||
|
||
- start/end rules were not enforced | ||
- length was not enforced | ||
|
||
This meant that e.g. usernames that start with a capital letter or were very long could result in servers failing to start because the escaping scheme produced an invalid label. | ||
To solve this, a new 'safe' scheme has been added in kubespawner 7 for computing template strings, | ||
which aims to guarantee to always produce valid object names and labels. | ||
The new scheme is the default in kubespawner 7. | ||
|
||
You can select the scheme with: | ||
|
||
```python | ||
c.KubeSpawner.slug_scheme = "escape" # no changes from kubespawner 6 | ||
c.KubeSpawner.slug_scheme = "safe" # default for kubespawner 7 | ||
``` | ||
|
||
The new scheme has the following rules: | ||
|
||
- the length of any _single_ template field is limited to 48 characters (the total length of the string is not enforced) | ||
- the result will only contain lowercase ascii letters, numbers, and `-` | ||
- it will always start and end with a letter or number | ||
- if a name is 'safe', it is used unmodified | ||
- if any escaping is required, a truncated safe subset of characters is used, followed by `---{hash}` where `{hash}` is a checksum of the original input string | ||
- `-` shall not occur in sequences of more than one consecutive `-`, except where inserted by the escaping mechanism | ||
- if no safe characters are present, 'x' is used for the 'safe' subset | ||
|
||
Since length requirements are applied on a per-field basis, a new `{user_server}` field is added, | ||
which computes a single valid slug following the above rules which is unique for a given user server. | ||
The general form is: | ||
|
||
``` | ||
{username}--{servername}---{hash} | ||
``` | ||
|
||
where | ||
|
||
- `--{servername}` is only present for non-empty server names | ||
- `---{hash}` is only present if escaping is required for _either_ username or servername, and hashes the combination of user and server. | ||
|
||
This `{user_server}` is the recommended value to use in pod names, etc. | ||
In the escape scheme, `{user_server}` is identical to the previous value used in default templates: `{username}--{servername}`, | ||
so it should be safe to upgrade previous templated using `{username}--{servername}` to `{user_server}` or `{escaped_user_server}`. | ||
|
||
In the vast majority of cases (where no escaping is required), the 'safe' scheme produces identical results to the 'escape' scheme. | ||
Probably the most common case where the two differ is in the presence of single `-` characters, which the `escape` scheme escaped to `-2d`, while the 'safe' scheme does not. | ||
|
||
Examples: | ||
|
||
| name | escape scheme | safe scheme | | ||
| ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------- | | ||
| `username` | `username` | `username` | | ||
| `has-hyphen` | `has-2dhyphen` | `has-hyphen` | | ||
| `Capital` | `-43apital` (error) | `capital---1a1cf792` | | ||
| `user@email.com` | `user-40email-2ecom` | `user-email-com---0925f997` | | ||
| `a-very-long-name-that-is-too-long-for-sixty-four-character-labels` | `a-2dvery-2dlong-2dname-2dthat-2dis-2dtoo-2dlong-2dfor-2dsixty-2dfour-2dcharacter-2dlabels` (error) | `a-very-long-name-that-is-too-long-for---29ac5fd2` | | ||
| `ALLCAPS` | `-41-4c-4c-43-41-50-53` (error) | `allcaps---27c6794c` | | ||
|
||
Most changed names won't have a practical effect. | ||
However, to avoid `pvc_name` changing even though KubeSpawner 6 didn't persist it, | ||
on first launch (for each server) after upgrade KubeSpawner checks if: | ||
|
||
1. `pvc_name_template` produces a different result with `scheme='escape'` | ||
1. a pvc with the old 'escaped' name exists | ||
|
||
and if such a pvc exists, the older name is used instead of the new one (it is then remembered for subsequent launches, according to `remember_pvc_name`). | ||
This is an attempt to respect the `remember_pvc_name` configuration, even though the old name is not technically recorded. | ||
We can infer the old value, as long as configuration has not changed. | ||
This will only work if upgrading KubeSpawer does not _also_ coincide with a change in the `pvc_name_template` configuration. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
"""Tools for generating slugs like k8s object names and labels | ||
|
||
Requirements: | ||
|
||
- always valid for arbitary strings | ||
- no collisions | ||
""" | ||
|
||
import hashlib | ||
import re | ||
import string | ||
|
||
_alphanum = tuple(string.ascii_letters + string.digits) | ||
_alphanum_lower = tuple(string.ascii_lowercase + string.digits) | ||
_lower_plus_hyphen = _alphanum_lower + ('-',) | ||
|
||
# patterns _do not_ need to cover length or start/end conditions, | ||
# which are handled separately | ||
_object_pattern = re.compile(r'^[a-z0-9\.-]+$') | ||
_label_pattern = re.compile(r'^[a-z0-9\.-_]+$', flags=re.IGNORECASE) | ||
|
||
# match anything that's not lowercase alphanumeric (will be stripped, replaced with '-') | ||
_non_alphanum_pattern = re.compile(r'[^a-z0-9]+') | ||
|
||
# length of hash suffix | ||
_hash_length = 8 | ||
|
||
|
||
def _is_valid_general( | ||
s, starts_with=None, ends_with=None, pattern=None, min_length=None, max_length=None | ||
): | ||
"""General is_valid check | ||
|
||
Checks rules: | ||
""" | ||
if min_length and len(s) < min_length: | ||
return False | ||
if max_length and len(s) > max_length: | ||
return False | ||
if starts_with and not s.startswith(starts_with): | ||
return False | ||
if ends_with and not s.endswith(ends_with): | ||
return False | ||
if pattern and not pattern.match(s): | ||
return False | ||
return True | ||
|
||
|
||
def is_valid_object_name(s): | ||
"""is_valid check for object names""" | ||
# object rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names | ||
return _is_valid_general( | ||
s, | ||
starts_with=_alphanum_lower, | ||
ends_with=_alphanum_lower, | ||
pattern=_object_pattern, | ||
max_length=255, | ||
min_length=1, | ||
) | ||
|
||
|
||
def is_valid_label(s): | ||
"""is_valid check for label values""" | ||
# label rules: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set | ||
if not s: | ||
# empty strings are valid labels | ||
return True | ||
return _is_valid_general( | ||
s, | ||
starts_with=_alphanum, | ||
ends_with=_alphanum, | ||
pattern=_label_pattern, | ||
max_length=63, | ||
) | ||
|
||
|
||
def is_valid_default(s): | ||
"""Strict is_valid | ||
|
||
Returns True if it's valid for _all_ our known uses | ||
|
||
So we can more easily have a single is_valid check. | ||
|
||
- object names have stricter character rules, but have longer max length | ||
- labels have short max length, but allow uppercase | ||
""" | ||
return _is_valid_general( | ||
s, | ||
starts_with=_alphanum_lower, | ||
ends_with=_alphanum_lower, | ||
pattern=_object_pattern, | ||
min_length=1, | ||
max_length=63, | ||
) | ||
|
||
|
||
def _extract_safe_name(name, max_length): | ||
"""Generate safe substring of a name | ||
|
||
Guarantees: | ||
|
||
- always starts and ends with a lowercase letter or number | ||
- never more than one hyphen in a row (no '--') | ||
- only contains lowercase letters, numbers, and hyphens | ||
- length at least 1 ('x' if other rules strips down to empty string) | ||
- max length not exceeded | ||
""" | ||
# compute safe slug from name (don't worry about collisions, hash handles that) | ||
# cast to lowercase | ||
# replace any sequence of non-alphanumeric characters with a single '-' | ||
safe_name = _non_alphanum_pattern.sub("-", name.lower()) | ||
# truncate to max_length chars, strip '-' off ends | ||
safe_name = safe_name.lstrip("-")[:max_length].rstrip("-") | ||
if not safe_name: | ||
# make sure it's non-empty | ||
safe_name = 'x' | ||
return safe_name | ||
|
||
|
||
def strip_and_hash(name, max_length=32): | ||
"""Generate an always-safe, unique string for any input | ||
|
||
truncates name to max_length - len(hash_suffix) to fit in max_length | ||
after adding hash suffix | ||
""" | ||
name_length = max_length - (_hash_length + 3) | ||
if name_length < 1: | ||
raise ValueError(f"Cannot make safe names shorter than {_hash_length + 4}") | ||
# quick, short hash to avoid name collisions | ||
name_hash = hashlib.sha256(name.encode("utf8")).hexdigest()[:_hash_length] | ||
safe_name = _extract_safe_name(name, name_length) | ||
# due to stripping of '-' in _extract_safe_name, | ||
# the result will always have _exactly_ '---', never '--' nor '----' | ||
# use '---' to avoid colliding with `{username}--{servername}` template join | ||
return f"{safe_name}---{name_hash}" | ||
|
||
|
||
def safe_slug(name, is_valid=is_valid_default, max_length=None): | ||
"""Always generate a safe slug | ||
|
||
is_valid should be a callable that returns True if a given string follows appropriate rules, | ||
and False if it does not. | ||
|
||
Given a string, if it's already valid, use it. | ||
If it's not valid, follow a safe encoding scheme that ensures: | ||
|
||
1. validity, and | ||
2. no collisions | ||
""" | ||
if '--' in name: | ||
# don't accept any names that could collide with the safe slug | ||
return strip_and_hash(name, max_length=max_length or 32) | ||
# allow max_length override for truncated sub-strings | ||
if is_valid(name) and (max_length is None or len(name) <= max_length): | ||
return name | ||
else: | ||
return strip_and_hash(name, max_length=max_length or 32) | ||
|
||
|
||
def multi_slug(names, max_length=48): | ||
"""multi-component slug with single hash on the end | ||
|
||
same as strip_and_hash, but name components are joined with '--', | ||
so it looks like: | ||
|
||
{name1}--{name2}---{hash} | ||
|
||
In order to avoid hash collisions on boundaries, use `\\xFF` as delimiter | ||
""" | ||
hasher = hashlib.sha256() | ||
hasher.update(names[0].encode("utf8")) | ||
for name in names[1:]: | ||
# \xFF can't occur as a start byte in UTF8 | ||
# so use it as a word delimiter to make sure overlapping words don't collide | ||
hasher.update(b"\xFF") | ||
hasher.update(name.encode("utf8")) | ||
hash = hasher.hexdigest()[:_hash_length] | ||
|
||
name_slugs = [] | ||
available_chars = max_length - (_hash_length + 1) | ||
# allocate equal space per name | ||
# per_name accounts for '{name}--', so really two less | ||
per_name = available_chars // len(names) | ||
name_max_length = per_name - 2 | ||
if name_max_length < 2: | ||
raise ValueError(f"Not enough characters for {len(names)} names: {max_length}") | ||
for name in names: | ||
name_slugs.append(_extract_safe_name(name, name_max_length)) | ||
|
||
# by joining names with '--', this cannot collide with single-hashed names, | ||
# which can only contain '-' and the '---' hash delimiter once | ||
return f"{'--'.join(name_slugs)}---{hash}" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function ended up not being used because instead of multiple is_valid args,
is_valid_default
is used, which validates the subset of object names and labels (same as object name with max length of 63 instead of 255)