-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support deprecating checkpoint names #45
Conversation
Codecov Report
@@ Coverage Diff @@
## main #45 +/- ##
==========================================
+ Coverage 96.20% 96.34% +0.13%
==========================================
Files 5 5
Lines 158 164 +6
==========================================
+ Hits 152 158 +6
Misses 6 6
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i'm not that used to this package so may not be the best reviewer for this but as i know the issue that this relates to i've made a few comments)
Can the deprecations not be encoded into the Handler
? With an appropriate constructor that means it's non-breaking. Right now, iiuc, the deprecations will produce a handler that will always encode the 'new' path. When indexing checkpoints, such as with any package that calls checkpoint_path
will always point to the new path. This means that any downstream packages that chooses to support the old name will have to do some mungeing to recreate them - which may be more appropriate to exist here?
To clarify a bit what I'd like to understand is what behaviour is supported with deprecations. A deprecated checkpoint will throw a warning that it is deprecated (and then go on to continue to use the new name) or a deprecated checkpoint will throw a warning but also support old paths.
Speaking as a user, it'd be good to support old paths, or at least having the option to opt-in to them. That way how data versioning is handled isn't presupposed here. Hope that makes sense.
@@ -29,7 +29,7 @@ __init__() = Memento.register(LOGGER) | |||
|
|||
include("handler.jl") | |||
|
|||
const CHECKPOINTS = Dict{String, Union{Nothing, Handler}}() | |||
const CHECKPOINTS = Dict{String, Union{Nothing, String, Handler}}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is using types to encode whether the checkpoint is deprecated or not. Thus it uses a String type to denote a deprecated status. Under the hood, it basically encodes a Pair that maps old -> new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. We could encode this information in as Dict{String, AbstractHandler}()
and have dedicated Disabled
and Deprecated
types. In the end, the string is just acting as a reference.
src/Checkpoints.jl
Outdated
# a regex / glob syntax | ||
function _config(handler, name::String) | ||
debug(LOGGER, "Checkpoint $name set to use $(handler)") | ||
CHECKPOINTS[name] = handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will take in the new name for the checkpoint and add in the handler as normal.
IIUC - this will replace the string encoded. I'm slightly confused here - does this mean that it will overright in CHECKPOINTS that it is deprecated, or do both entries remain - i.e does checkpoints look like (some details abstracted away):
:c1 => Handler(c1), :c2 => Handler(c2), "c3" => "c3_new", :c3_new" => Hander(c3_new)
Or:
:c1 => Handler(c1), :c2 => Handler(c2), :c3 => Hander(c3_new)
If the latter, won't deprecated_checkpoints
be unable to identify the deprecations after config
has run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CHECKPOINTS[n] isa String
condition at the call-site is to ensure the former and not the latter occurs.
No, because deprecations should be the responsibility of package authors. As a result, we need to define deprecations when the author registers the checkpoints. At the point where users are enabling checkpoints with a particular handler we no longer know about the deprecations. The only alternative would be to introduce an
I don't see how a different constructor would change that. Regardless of how this information is encoded in the package internals we must choose one of the following:
Hmm, maybe? That definitely increases the scope of this PR by adding extra functionality that doesn't seem well thought out yet in our internal code. I suppose the simplest way to add that functionality would be with:
That doesn't seem like a very clean API and would be more breaking because any code that serializes the
Since Checkpoints.jl isn't generally responsible for loading checkpoints, though
I think with the current codebase and feature set there isn't any easy way to support old paths without adding unnecessary cruft. As you mentioned earlier, we could integrate the deprecations into the |
I've added a commit to include the deprecated names in our index, though it does feel a bit ugly. Loading the deprecation map correctly depends on having the correct package / version that generated the checkpoint loaded. Seems unlikely that we'd want to load checkpoints for packages without having the package loaded, but it's worth noting. |
OK interesting. Responding to this first, as this is actually at the heart of my previous comments, as I was assuming that this was the case - (not necessary loading, but identifying the path). Deprecating a checkpoint and saving to a new name is entirely reasonable (and fits the deprecation pattern you point out). However with the way that the path is encoded my understand is that all previous checkpointed data essentially becomes lost to us ("can't find checkpoint foo but we can find foo_new"). Data that is actually quite costly to create. My view was that it was up to the downstream package to choose whether to support loading the deprecated checkpoints (or not). Without this, there isn't much gain in deprecating as it's no longer backwards compatible anyway, with the purpose of deprecation to act as a bridge between old and new. EDIT: Peering a bit closer, I now see that the handler and pathing look entirely separate. The indexing simply comes from selecting the jlso files within a path. This is a misunderstanding I had, and was why I was referring to updating the handler previously.l |
OK I think without my above misconception (that the package had more responsibility in loading) I see the merit in the original MR. IIUC this last commit correct, the IndexEntry essentially knows how to distinguish between real and deprecated files though as you point out this is a bit of a workaround. There is no issue before in actually indexing the deprecated files - they will still appear, it's just downstream it will look for 'name' but only finds 'old_name'. With that in mind, fwiw my opinion is that the last commit should be removed and keep the indexing as it was before, given that responsibility for loading these checkpoints. One intermediate option, that may be worth incorporating an What do you think? Sorry for the two-and-fro - I was originally surprised we would need to work on our downstream package to handle this but on reflection I think that it makes sense what you had originally (both the MR here and into actions) |
That seems like a bit of an exaggeration. There are still plenty of breadcrumbs to find what you're looking for. Since we only do this kind of searching in 1 downstream package right now we just need to inspect the deprecated checkpoints list first. I can only think of two cases where the checkpoint might be "lost":
Honestly, if this only ran on a local filesystem I'd say we could just use a
I'm not really sure what the |
OK point made.
Any applications I personally use checkpoints with all the data is on S3. I'm not entirely sure how they are handled when they are loaded via awss3. Regardless, creating symlinks is a workaround and not really a solution to graceful loading of deprecated checkpoints. Further, as you pointed out, checkpoints isn't strictly responsible for loading.
Cool - fwiw yeah I agree.
This was really in response to seeing the kind of pattern being added to our downstream package to handle the deprecated name. All this was suggesting was a utility to know if a particular Anyway for the purposes of getting this over the line I don't think that there is anything blocking it? Other than reworking the last commit into a separate MR that we seemed to mutually agree on. I'll see if I can approve but up to you want other (more experienced than my) eyes on it! |
Co-authored-by: Alex Robson <AlexRobson@users.noreply.github.com>
If you want to rename a checkpoint you can now do.
which makes the deprecation searchable with
deprecated_checkpoints
and will handle configuring the correct checkpoint after throwing a warning about the change.This should be a non-breaking change.