Issue of vcf_to_zarr()'s .zarr output

The .zarr output of `vcf_to_zarr()` is smaller than before in terms of size, like for a 201MB 1000 Genomes chromosome 21 vcf file, the new result is 50MB zarr file while it was 321MB before. However, I found some problems of this new compressed version of .zarr.

For example, if I simply run the following reading data and write:
```
ds = sg.load_dataset("/mydata/chr21.zarr")
ds[['variant_id']].to_zarr("/mydata/output.zarr", compute=True)
```
It works for the old 321MB version, but the new 50MB version would raise error:
```
/users/Liangde/.local/lib/python3.8/site-packages/xarray/conventions.py:201: SerializationWarning: variable None has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it.
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/core/dataset.py", line 2031, in to_zarr
    return to_zarr(
  File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/api.py", line 1414, in to_zarr
    dump_to_store(dataset, zstore, writer, encoding=encoding)
  File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/api.py", line 1124, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/zarr.py", line 555, in store
    self.set_variables(
  File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/zarr.py", line 634, in set_variables
    writer.add(v.data, zarr_array, region)
  File "/users/Liangde/.local/lib/python3.8/site-packages/xarray/backends/common.py", line 155, in add
    target[region] = source
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1213, in __setitem__
    self.set_basic_selection(selection, value, fields=fields)
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1308, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1599, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1651, in _set_selection
    self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1888, in _chunk_setitem
    self._chunk_setitem_nosync(chunk_coords, chunk_selection, value,
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1893, in _chunk_setitem_nosync
    cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields)
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 1952, in _process_for_setitem
    return self._encode_chunk(chunk)
  File "/users/Liangde/.local/lib/python3.8/site-packages/zarr/core.py", line 2001, in _encode_chunk
    chunk = f.encode(chunk)
  File "numcodecs/vlen.pyx", line 106, in numcodecs.vlen.VLenUTF8.encode
TypeError: expected unicode string, found 16
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue of vcf_to_zarr()'s .zarr output #785

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue of vcf_to_zarr()'s .zarr output #785

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions