Open
Description
>>> import sgkit as sg
>>> ds = sg.simulate_genotype_call_dataset(n_variant=10, n_sample=2)
>>> ds
<xarray.Dataset>
Dimensions: (alleles: 2, ploidy: 2, samples: 2, variants: 10)
Dimensions without coordinates: alleles, ploidy, samples, variants
Data variables:
variant_contig (variants) int64 0 0 0 0 0 0 0 0 0 0
variant_position (variants) int64 0 1 2 3 4 5 6 7 8 9
variant_allele (variants, alleles) |S1 b'C' b'G' b'A' ... b'C' b'A'
sample_id (samples) <U2 'S0' 'S1'
call_genotype (variants, samples, ploidy) int8 0 0 1 0 1 ... 0 1 0 1 1
call_genotype_mask (variants, samples, ploidy) bool False False ... False
Attributes:
contigs: [0]
>>> sg.count_variant_alleles(ds)
<xarray.Dataset>
Dimensions: (alleles: 2, ploidy: 2, samples: 2, variants: 10)
Dimensions without coordinates: alleles, ploidy, samples, variants
Data variables:
variant_allele_count (variants, alleles) uint64 dask.array<chunksize=(10, 2), meta=np.ndarray>
call_allele_count (variants, samples, alleles) uint8 dask.array<chunksize=(10, 2, 2), meta=np.ndarray>
variant_contig (variants) int64 0 0 0 0 0 0 0 0 0 0
variant_position (variants) int64 0 1 2 3 4 5 6 7 8 9
variant_allele (variants, alleles) |S1 b'C' b'G' b'A' ... b'C' b'A'
sample_id (samples) <U2 'S0' 'S1'
call_genotype (variants, samples, ploidy) int8 0 0 1 0 1 ... 1 0 1 1
call_genotype_mask (variants, samples, ploidy) bool False False ... False
Attributes:
contigs: [0]
>>> sg.count_variant_alleles(ds, merge=False)
<xarray.Dataset>
Dimensions: (alleles: 2, variants: 10)
Dimensions without coordinates: alleles, variants
Data variables:
variant_allele_count (variants, alleles) uint64 dask.array<chunksize=(10, 2), meta=np.ndarray>
Notice that for the call with merge=False
the intermediate variable call_allele_count
is not returned (and is effectively lost). I'm not sure if this behaviour is a problem in practice, but it might be something we want to fix.