Versioning codecs #148

jakirkham · 2022-07-21T17:26:45Z

Currently codecs don't have a version. Thus if they are changed in a breaking API, this would break loading the data. Would it make sense to include one? Or would we want to handle this a different way (like naming the codec differently, for example bz2 & bz2_2)?

xref: fsspec/kerchunk#198 (comment) (where this came up originally)

cc @martindurant @joshmoore

The text was updated successfully, but these errors were encountered:

jbms · 2022-07-21T17:54:56Z

It seems like just using a different identifier for the codec would be simpler, but an individual codec could also use a version field as part of its json representation if that were useful. It might help to consider specific examples.

jakirkham · 2022-07-22T18:17:29Z

Yeah that sounds like the naming suggestion above. Agree that's one way to go about it.

Think Martin had a more specific example. So hopefully he can chime in 🙂

martindurant · 2022-07-22T18:20:46Z

I managed to make my change not break API on this occasion, but I don't anticipate being able to every time. I would probably do the version argument rather than a new name for when I expect all new data to use the new code and only maintain the old one for existing data. In either case, this leaves it up to the codec authors to make the decision, but it might be a nice thing to mention in our developer docs.

jakirkham · 2022-07-22T18:26:18Z

So interesting question, how would the old data get loaded if there was a break? Would the library need to keep around 2 decoders? Would users need to go back to an earlier version of the library? Or should something else be done?

martindurant · 2022-07-22T18:28:01Z

You would need to keep the old code, referenced with the same codec name - unless we invent some other mechanism

jakirkham · 2022-07-22T18:30:02Z

Should version number increments only indicate breaking changes or could they indicate other things? If the latter, when else would we want to use them?

martindurant · 2022-07-22T18:32:35Z

Up to the author, I suppose, but if we have a simple version number like 1, 2, ..., then I suppose breaking changes as in semver. It could be conceivable to have more prescriptive codec names in the .zarray like {"id": "gzip~=1.2.3"}, but we usually save that kind of stuff for an environment file. I note that intake catalogs, for example, only give functions and arguments and versions thereof.

jakirkham · 2022-07-22T18:54:41Z

Limiting to breaking changes makes sense.

Was thinking about reproducibility (IOW if someone wants to use the exact same libraries to read as wrote the data). Though maybe that can be captured in separate optional metadata ( #139 ).

martindurant · 2022-07-22T19:07:43Z

Yeah, I'm not sure how much information it makes sense to include directly in the dataset, as opposed to catalog or other metadata location (e.g., unique run ID for pangeo-forge).

joshmoore · 2022-08-03T21:13:14Z

As someone who won't regularly have a catalog, I'd vote for adding this to the dataset itself. In my mind, it's the schema of the config that we're versioning, no? If a single json-schema (without ORs) can't validate the config, then I assume you'd to point to a different schema (i.e., a different purl)

jakirkham · 2022-08-03T21:21:40Z

It seems we agree that having the codec version is useful

Though the last few comments start discussing other version info (like info about the writer). Do we want to keep discussing that here or raise a new issue? If the former, maybe we can start enumerating what other versioned info/metadata we would want to include.

jakirkham mentioned this issue Jul 21, 2022

Rework grib2 fsspec/kerchunk#198

Merged

jakirkham mentioned this issue Sep 23, 2022

Drop LegacyJSON codec zarr-developers/numcodecs#225

Closed

jstriebel added the core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec label Nov 15, 2022

jstriebel mentioned this issue Nov 30, 2022

move codecs into separate (versioned documents), update urls #187

Merged

jstriebel closed this as completed in #187 Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Versioning codecs #148

Versioning codecs #148

jakirkham commented Jul 21, 2022

jbms commented Jul 21, 2022

jakirkham commented Jul 22, 2022

martindurant commented Jul 22, 2022

jakirkham commented Jul 22, 2022

martindurant commented Jul 22, 2022

jakirkham commented Jul 22, 2022 •

edited

Loading

martindurant commented Jul 22, 2022

jakirkham commented Jul 22, 2022

martindurant commented Jul 22, 2022

joshmoore commented Aug 3, 2022

jakirkham commented Aug 3, 2022

Versioning codecs #148

Versioning codecs #148

Comments

jakirkham commented Jul 21, 2022

jbms commented Jul 21, 2022

jakirkham commented Jul 22, 2022

martindurant commented Jul 22, 2022

jakirkham commented Jul 22, 2022

martindurant commented Jul 22, 2022

jakirkham commented Jul 22, 2022 • edited Loading

martindurant commented Jul 22, 2022

jakirkham commented Jul 22, 2022

martindurant commented Jul 22, 2022

joshmoore commented Aug 3, 2022

jakirkham commented Aug 3, 2022

jakirkham commented Jul 22, 2022 •

edited

Loading