Skip to content

Custom codec information is silently dropped when writing kerchunk references #770

@frazane

Description

@frazane

I encountered this bug when working on a custom GRIB parser in https://github.com/MeteoSwiss/icon-ch-vzarr.

When trying to serialize virtual datasets as kerchunk references, the custom codec information is silently dropped. The array metadata looks like this (note the codecs entry):

ArrayV3Metadata(shape=(1, 1147980),
                data_type=Float64(endianness='little'),
                chunk_grid=RegularChunkGrid(chunk_shape=(1, 1147980)),
                chunk_key_encoding=DefaultChunkKeyEncoding(name='default',
                                                           separator='/'),
                fill_value=np.float64(0.0),
                codecs=(EccodesCodec(),),
                attributes={'_earthkit': {'b64message': 'R1JJQv//AAIAAAAAAAAAqgAAABUBANcA/w8BAQfpAQEUAAABAQAAABwCAP4AB+kBARU0AwAAAAAAAAAAAAAAAAEAAAAjAwAAEYRMAAAAZQYAAAEBF2Q9oldJWbZE0lSjzW4rwAAAACIEAAAAAAAAAgCXAAAAAAAAAABnAAAAAAL///////8AAAAVBQAAAfAAAEOIgACACgAAAAAAAAAGBv8AAAAFBzc3Nzc=',
                                          'bitsPerValue': 16},
                            'long_name': '2m Temperature',
                            'standard_name': 'air_temperature',
                            'units': 'K'},
                dimension_names=('valid_time', 'values'),
                zarr_format=3,
                node_type='array',
                storage_transformers=())

but then the codecs (filter/compressors) information is not found in the references (it's null):

{"version":1,"refs":{".zgroup":"{\"zarr_format\":2}",".zattrs":"{}","T_2M\/0.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010120",3424926372,2296130],"T_2M\/1.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010121",3427222332,2296130],"T_2M\/2.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010122",3424926372,2296130],"T_2M\/3.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010123",3424926372,2296130],"T_2M\/.zarray":"{\"shape\":[4,1147980],\"chunks\":[1,1147980],\"dtype\":\"<f8\",\"fill_value\":0.0,\"order\":\"C\",\"filters\":null,\"dimension_separator\":\".\",\"compressor\":null,\"attributes\":{},\"zarr_format\":2}","T_2M\/.zattrs":"{\"standard_name\":\"air_temperature\",\"long_name\":\"2m Temperature\",\"units\":\"K\",\"_earthkit\":{\"bitsPerValue\":16,\"b64message\":\"R1JJQv\/\/AAIAAAAAAAAAqgAAABUBANcA\/w8BAQfpAQEUAAABAQAAABwCAP4AB+kBARU0AwAAAAAAAAAAAAAAAAEAAAAjAwAAEYRMAAAAZQYAAAEBF2Q9oldJWbZE0lSjzW4rwAAAACIEAAAAAAAAAgCXAAAAAAAAAABnAAAAAAL\/\/\/\/\/\/\/8AAAAVBQAAAfAAAEOIgACACgAAAAAAAAAGBv8AAAAFBzc3Nzc=\"},\"_ARRAY_DIMENSIONS\":[\"valid_time\",\"values\"]}","CLCL\/0.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010120",3721779870,2296130],"CLCL\/1.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010121",3724075830,2296130],"CLCL\/2.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010122",3721779870,2296130],"CLCL\/3.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010123",3721779870,2296130],"CLCL\/.zarray":"{\"shape\":[4,1147980],\"chunks\":[1,1147980],\"dtype\":\"<f8\",\"fill_value\":0.0,\"order\":\"C\",\"filters\":null,\"dimension_separator\":\".\",\"compressor\":null,\"attributes\":{},\"zarr_format\":2}","CLCL\/.zattrs":"{\"standard_name\":\"unknown\",\"long_name\":\"Cloud Cover (800 hPa - Soil)\",\"units\":\"%\",\"_earthkit\":{\"bitsPerValue\":16,\"b64message\":\"R1JJQv\/\/AAIAAAAAAAAAqgAAABUBANcA\/w8BAQfpAQEUAAABAQAAABwCAP4AB+kBARU0BAAAAAAAAAAAAAAAAAEAAAAjAwAAEYRMAAAAZQYAAAEBF2Q9oldJWbZE0lSjzW4rwAAAACIEAAAAAAYWAgCXAAAAAAAAAABkAAABOIABAAAAAAAAAAAVBQAAAfAAAEOIgACACgAAAAAAAAAGBv8AAAAFBzc3Nzc=\"},\"_ARRAY_DIMENSIONS\":[\"valid_time\",\"values\"]}","TOT_PREC\/0.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010120",3792671464,194],"TOT_PREC\/1.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010121",3794978942,194],"TOT_PREC\/2.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010122",3792683114,194],"TOT_PREC\/3.0":["\/store_new\/mch\/msopr\/osm\/KENDA-CH1\/ANA25\/det\/iaf2025010123",3792680428,194],"TOT_PREC\/.zarray":"{\"shape\":[4,1147980],\"chunks\":[1,1147980],\"dtype\":\"<f8\",\"fill_value\":0.0,\"order\":\"C\",\"filters\":null,\"dimension_separator\":\".\",\"compressor\":null,\"attributes\":{},\"zarr_format\":2}","TOT_PREC\/.zattrs":"{\"standard_name\":\"unknown\",\"long_name\":\"Total Precipitation (Accumulation)\",\"units\":\"kg m-2\",\"_earthkit\":{\"bitsPerValue\":0,\"b64message\":\"R1JJQv\/\/AAIAAAAAAAAAwgAAABUBANcA\/w8BAQfpAQEUAAABAQAAABwCAP4AB+kBARU0BQAAAAAAAAAAAAAAAAEAAAAjAwAAEYRMAAAAZQYAAAEBF2Q9oldJWbZE0lSjzW4rwAAAADoEAAAACAE0AgCXAAAAAAAAAAABAAAAAAD\/\/\/\/\/\/\/8H6QEBFAAAAQAAAAABAgAAAAAA\/wAAAAAAAAAVBQAAAfAAAEOIgACACgAAAAAAAAAGBv8AAAAFBzc3Nzc=\"},\"_ARRAY_DIMENSIONS\":[\"valid_time\",\"values\"]}","valid_time\/0":"base64:AAAAAAAAAAABAAAAAAAAAAIAAAAAAAAAAwAAAAAAAAA=","valid_time\/.zarray":"{\"shape\":[4],\"chunks\":[4],\"dtype\":\"<i8\",\"fill_value\":null,\"order\":\"C\",\"filters\":null,\"dimension_separator\":\".\",\"compressor\":null,\"attributes\":{},\"zarr_format\":2}","valid_time\/.zattrs":"{\"units\":\"hours since 2025-01-01 20:00:00\",\"calendar\":\"proleptic_gregorian\",\"_ARRAY_DIMENSIONS\":[\"valid_time\"]}"}}

Issue seems to be here:

v2_codecs = [

if my custom codec is a subclass of ArrayBytesCodec it will be excluded. There's also a TODO left there which may refer to this.


I added a reproducible example here (run it with uv run virtualize_kenda.py): https://gist.github.com/frazane/d26fd8925aea11cadf5bb012d81c5c2e

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingreferences generationReading byte ranges from archival files

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions