-
Notifications
You must be signed in to change notification settings - Fork 36
Description
I feel like we would benefit from having a simple pairsamtools subsample tool (or an option to subsample for pairsamtools select) ...
The rationale being - to enable us to do some "rigorous" statistics/significance estimation/bootstrapping/permutation testing for some of the analyses, e.g., if we want to measure a "subtle" compartment strength difference between 2 experiments, and we have 10 mln and 12 mln pairs for the experiments - one can subsample both down to 5 mln several times and calculate a compartment strength for each subsample and compare the resultant distributions. Another example would be - subsampling and mixing mitotic and G1 pairs to check if some experimental effects could be explained by such a simple mixture, etc.
Technical notes/questions:
- the only way to subsample in 1 pass (streaming like) is by knowing the total # of pairs (#pairs per chrom etc) a priori ?!
- there might be need to implement more sophisticated samplings - distance dependent weights, chrom dependent weights, cis/trans, etc (do not overdo what's already available in
select) ... - any other way to do a streaming-like subsample ? Do we need to care about its streaming nature ?
- would
pairixindex help speed up subsampling ? Should we rely on it ? - does it seem like
subsamplefit intoselector it deserves to be a separate tool ?