Track and improve the performance of allele counting method

The solution to https://github.com/pystatgen/sgkit/issues/3 in https://github.com/pystatgen/sgkit/pull/36 is naive and possibly unacceptably slow.  This will be true if Dask does not optimize the loop over allele indexes to a single pass on the genotypes array (which it probably won't).

The extension to this proposed in https://github.com/pystatgen/sgkit/pull/36#issuecomment-656611356 would definitely solve the problem in a single pass if Dask supported counting rows like numpy does, but it currently doesn't.  

There may be some other efficient ways to do it without dropping down to writing custom kernels but in any case, we should track the performance of this implementation (and others) as part of a benchmark suite like @alimanfoo mentioned in https://github.com/pystatgen/sgkit/pull/36#issuecomment-658893949 so we can measure the impact of future iterations more passively and prevent regressions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Track and improve the performance of allele counting method #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Track and improve the performance of allele counting method #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions