Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Analyses including multiple samples from the same individual #155

Closed
@jashapiro

Description

Question/issue

Many of the analyses that have been proposed are sensitive to the fact that we have multiples tumor samples from different time points or tumors from the same individual. For example, biospecimens BS_K07KNTFY and BS_AQMKA8NC are both tumor WGS data (initial and recurrence, respectively) from participant PT_00G007DM. While this is extremely useful data, it presents questions for many particular analyses, which I would like to discuss in this issue.

In particular, analyses of mutation prevalence, variant allele frequency distributions, classification accuracy, etc. are likely to be affected by these non-independent samples. In some cases, a simple awareness of the issue will be sufficient, and analyses can be written to account for or take advantage of the redundancy in the data. However for many analyses, decisions of which samples to include or exclude will need to be made, and it would be good to have an agreed upon set of standards and procedures.

For a specific example, in the analysis of mutation co-occurrence (#13), including all samples would result in many spurious reports of co-occurrence, as it is quite common for two samples from the same individual to have the same sets of mutations. Similarly, analyses of recurrent fusions (#10), distribution of tumor mutation burden (#3), etc. will likely be affected.

One potential solution is to use only primary tumors and/or the earliest sampled tumor from each individual in analyses such as this. However, this would miss some potential co-occurrence patterns that may be important in progression and recurrence, which might suggest that the latest tumor from each individual would be better. Doing both is of course an option as well, but I am curious to hear what others think is most appropriate. Ultimately, we may want to add a recommendation to the documentation for future analyses.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions