Skip to content

[FEATURE REQUEST] expanded/updated documentation of anvi-compute-functional-enrichment (& friends) + blog post #2380

Open
@adw96

Description

@adw96

The need

Functional enrichment is a widely used piece of the pangenomics workflow, but as with any automated statistical procedure, its important to give clear guidance on its use case and monitor its use in the wild. A quick survey of papers citing Shaiber et al (by @ivagljiva and @adw96 ) suggested that most users are doing a great job of using the method appropriately. That said, there are a few things that I (as the original author of the underpinning script) could do to clarify its use case, point out some potential pitfalls, and generally guide people in the right direction.

The solution

I aspire to do the following

  • documentation
    • clarify that use case is for pre-determined groups. Groups should not be determined using the pangenome and then tested for differentially enriched functions.
    • clarify use case is two-group comparison
    • point people to blog post for more complex designs
  • blog post
    • how to pull out the relevant data and import into R
    • showcase flexibility of general procedure
      • provide clear interpretation of estimated parameters
      • incorporating additional covariates
      • how you could look at time series data
      • how you could do a global test eg if you have >2 groups
    • how to fit a different model or run a different test. Showcase happi as example 🥕

A challenge will be that, unlike the two group comparison case, users now need to choose what model is reasonable. While many uses have fantastic intuition for this, writing out the "rules" is very difficult, and many people aren't going to get good statistical instruction (especially not from chatgpt/the internet). So, how to we guide people without writing a textbook. (Could point them to In Press NM paper?)

I aspire to have a draft on a branch by the beginning of February. I will ask @ivagljiva and @tucker4 for feedback.

Beneficiaries

Folx using the pangenomics workflow.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions