-
Notifications
You must be signed in to change notification settings - Fork 37
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
The .zarr output of vcf_to_zarr()
is smaller than before in terms of size, like for a 201MB 1000 Genomes chromosome 21 vcf file, the new result is 50MB zarr file while it was 321MB before. However, I found some problems of this new compressed version of .zarr.
For example, if I simply run the following reading data and write:
ds = sg.load_dataset("/mydata/chr21.zarr")
ds[['variant_id']].to_zarr("/mydata/output.zarr", compute=True)
It works for the old 321MB version, but the new 50MB version would raise error:
/users/Liangde/.local/lib/python3.8/site-packages/xarray/conventions.py:201: SerializationWarning: variable None has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it.
warnings.warn(
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/core/dataset.py", line 2031, in to_zarr
return to_zarr(
File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/api.py", line 1414, in to_zarr
dump_to_store(dataset, zstore, writer, encoding=encoding)
File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/api.py", line 1124, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/zarr.py", line 555, in store
self.set_variables(
File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/zarr.py", line 634, in set_variables
writer.add(v.data, zarr_array, region)
File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/common.py", line 155, in add
target[region] = source
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1213, in __setitem__
self.set_basic_selection(selection, value, fields=fields)
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1308, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1599, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1651, in _set_selection
self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1888, in _chunk_setitem
self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1893, in _chunk_setitem_nosync
cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields)
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1952, in _process_for_setitem
return self._encode_chunk(chunk)
File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 2001, in _encode_chunk
chunk = f.encode(chunk)
File "numcodecs/vlen.pyx", line 106, in numcodecs.vlen.VLenUTF8.encode
TypeError: expected unicode string, found 16
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working