Closed
Description
Here's an example dataset of a large dataset from @alimanfoo:
https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764
<xarray.Dataset>
Dimensions: (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3, __variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2, samples: 1142, variants: 21442865)
Coordinates:
samples/ID (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
variants/CHROM (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
variants/POS (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
variants/ABHet (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
variants/ABHom (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
variants/AC (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
variants/AF (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...
I know similarly large datasets with lots of dimensions come up in other contexts as well, e.g., with geophysical model output.
That's a very long first line! This would be easier to read as:
<xarray.Dataset>
Dimensions: (__variants/BaseCounts_dim1: 4, __variants/MLEAC_dim1: 3,
__variants/MLEAF_dim1: 3, alt_alleles: 3, ploidy: 2,
samples: 1142, variants: 21442865)
Coordinates:
samples/ID (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
variants/CHROM (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
variants/POS (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
variants/ABHet (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
variants/ABHom (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
variants/AC (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
variants/AF (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...
or maybe:
<xarray.Dataset>
Dimensions:
__variants/BaseCounts_dim1: 4
__variants/MLEAC_dim1: 3
__variants/MLEAF_dim1: 3
alt_alleles: 3
ploidy: 2
samples: 1142
variants: 21442865
Coordinates:
samples/ID (samples) object dask.array<chunksize=(1142,), meta=np.ndarray>
variants/CHROM (variants) object dask.array<chunksize=(21442865,), meta=np.ndarray>
variants/POS (variants) int32 dask.array<chunksize=(4194304,), meta=np.ndarray>
Dimensions without coordinates: __variants/BaseCounts_dim1, __variants/MLEAC_dim1, __variants/MLEAF_dim1, alt_alleles, ploidy, samples, variants
Data variables:
variants/ABHet (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
variants/ABHom (variants) float32 dask.array<chunksize=(4194304,), meta=np.ndarray>
variants/AC (variants, alt_alleles) int32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
variants/AF (variants, alt_alleles) float32 dask.array<chunksize=(4194304, 3), meta=np.ndarray>
...
Dimensions without coordinates
could probably use some wrapping, too.
Metadata
Metadata
Assignees
Labels
No labels