Skip to content

pairsamtools subsampling [new tool, enhancement] #66

@sergpolly

Description

@sergpolly

I feel like we would benefit from having a simple pairsamtools subsample tool (or an option to subsample for pairsamtools select) ...

The rationale being - to enable us to do some "rigorous" statistics/significance estimation/bootstrapping/permutation testing for some of the analyses, e.g., if we want to measure a "subtle" compartment strength difference between 2 experiments, and we have 10 mln and 12 mln pairs for the experiments - one can subsample both down to 5 mln several times and calculate a compartment strength for each subsample and compare the resultant distributions. Another example would be - subsampling and mixing mitotic and G1 pairs to check if some experimental effects could be explained by such a simple mixture, etc.

Technical notes/questions:

  • the only way to subsample in 1 pass (streaming like) is by knowing the total # of pairs (#pairs per chrom etc) a priori ?!
  • there might be need to implement more sophisticated samplings - distance dependent weights, chrom dependent weights, cis/trans, etc (do not overdo what's already available in select) ...
  • any other way to do a streaming-like subsample ? Do we need to care about its streaming nature ?
  • would pairix index help speed up subsampling ? Should we rely on it ?
  • does it seem likesubsample fit into select or it deserves to be a separate tool ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions