Skip to content

Fst function #225

Closed
Closed
@jeromekelleher

Description

@jeromekelleher

Fst is one of the fundamental building blocks of population structure analysis. We will want to compute Fst between pairs of populations, with the "population" labels designated by some properties of the input dataset (see #224).

There are a number of different estimators for Fst (scikit-allel implements 3), so we should provide a method to specify the estimator for the statistic as a parameter. I suggest something like the following:

def Fst(ds, *, estimator=None, **kwargs):
    if estimator = None:
        estimator = "hudson"
    estimator_map = {
        "hudson": hudson_Fst,
        "weir_cockerham": wc_Fst,
        "patterson": patterson_Fst
    } 
    return estimator_map[estimator](ds, **kwargs)

These correspond to the three definitions in scikit-allele. We may not want all three initially, and just implementing the Hudson estimator may be sufficient. We can test our implementations by comparing with scikit-allele and tskit

(ps. I prefer to use None as the default value for estimator, as there may be situations in the future where we might prefer to have a different default depending on properties of the dataset. If we leave estimator="hudson" in the signature, then there's no way to tell if the user just wants the default or has specifically asked for "hudson". In general, unless we're totally sure that the default is never going to change, I think it's better to use None as the default value in the signature.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    core operationsIssues related to domain-specific functionality such as LD pruning, PCA, association testing, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions