-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioning codecs #148
Comments
It seems like just using a different identifier for the codec would be simpler, but an individual codec could also use a version field as part of its json representation if that were useful. It might help to consider specific examples. |
Yeah that sounds like the naming suggestion above. Agree that's one way to go about it. Think Martin had a more specific example. So hopefully he can chime in 🙂 |
I managed to make my change not break API on this occasion, but I don't anticipate being able to every time. I would probably do the version argument rather than a new name for when I expect all new data to use the new code and only maintain the old one for existing data. In either case, this leaves it up to the codec authors to make the decision, but it might be a nice thing to mention in our developer docs. |
So interesting question, how would the old data get loaded if there was a break? Would the library need to keep around 2 decoders? Would users need to go back to an earlier version of the library? Or should something else be done? |
You would need to keep the old code, referenced with the same codec name - unless we invent some other mechanism |
Should version number increments only indicate breaking changes or could they indicate other things? If the latter, when else would we want to use them? |
Up to the author, I suppose, but if we have a simple version number like 1, 2, ..., then I suppose breaking changes as in semver. It could be conceivable to have more prescriptive codec names in the .zarray like {"id": "gzip~=1.2.3"}, but we usually save that kind of stuff for an environment file. I note that intake catalogs, for example, only give functions and arguments and versions thereof. |
Limiting to breaking changes makes sense. Was thinking about reproducibility (IOW if someone wants to use the exact same libraries to read as wrote the data). Though maybe that can be captured in separate optional metadata ( #139 ). |
Yeah, I'm not sure how much information it makes sense to include directly in the dataset, as opposed to catalog or other metadata location (e.g., unique run ID for pangeo-forge). |
As someone who won't regularly have a catalog, I'd vote for adding this to the dataset itself. In my mind, it's the schema of the config that we're versioning, no? If a single json-schema (without |
It seems we agree that having the codec version is useful Though the last few comments start discussing other version info (like info about the writer). Do we want to keep discussing that here or raise a new issue? If the former, maybe we can start enumerating what other versioned info/metadata we would want to include. |
Currently codecs don't have a version. Thus if they are changed in a breaking API, this would break loading the data. Would it make sense to include one? Or would we want to handle this a different way (like naming the codec differently, for example
bz2
&bz2_2
)?xref: fsspec/kerchunk#198 (comment) (where this came up originally)
cc @martindurant @joshmoore
The text was updated successfully, but these errors were encountered: