-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions Regarding S-LDSC Analysis with --ref-ld-chr-cts #447
Comments
|
Thank you for your response! I would like to seek further clarification on the interpretation of regression coefficients when control annotations are included in the S-LDSC model. Here is a snippet from the .ldcts file in Multi_tissue_gene_expr.ldcts:
I’m curious about how to interpret the regression coefficients for both the target and control annotations when the control annotation is used as a covariate in the regression model, especially when this control annotation is shared across multiple tissues. For instance, in the Multi_tissue_gene_expr.ldcts file, multiple tissue annotations share the same control: Multi_tissue_gene_expr_1000Gv3_ldscores/GTEx.control. Does this mean that the regression coefficient for the target annotation (where annotation = 1 vs. 0) is independent of whether the control annotation is equal to 1 or 0? If so, what exactly does the control annotation represent in this context, and how should we interpret its coefficient when included in the model? Thank you very much for your time and assistance. Best regards, |
Recall that ldsc fits a multiple linear regression of chi^2 statistics onto the LD scores partitioned by each annotation. This means that the regression coefficient for the target annotation is not independent of the control annotations. In the context of looking for cell-type specific annotations, the control annotations (baseline model) are meant to represent non-specific sources of heritability enrichment. The reason to include them is to provide evidence that an enrichment for a cell-type specific annotation is indeed driven by something cell-type specific, and not something non-specific (for example, promoter-associated histone modifications). This does seem to require using the "standard" baseline model referred to in Finucane et al. 2015, etc. in addition to any additional controls. |
Thank you so much for your detailed response to my previous inquiry! but I realize that my previous question might not have been as clear as it could have been, here is a detailed description. In the context of cell type specific analyses, as demonstrated in the wiki’s demo code, the --ref-ld-chr-cts flag is used to specify a .ldcts file that includes both target and control annotations for each cell or tissue type. Below is an example of the demo code provided:
In this code, the --ref-ld-chr flag specifies the use of the baseline model (1000G_EUR_Phase3_baseline/baseline.), which, as you mentioned, captures broad non-specific sources of heritability enrichment. However, the --ref-ld-chr-cts flag simultaneously specifies a .ldcts file, which includes both target and control annotations. Here is an example of what such a .ldcts file might look like:
When I read the corresponding files in R. Here is an example of what I found:
For the control annotation:
Given this setup, my question is:
I hope this explanation clarifies my questions. I would be very grateful for any further insights you could provide. Thank you once again for your time and response! |
As stated in the wiki
So, the answers to the questions are still as I gave above. The interpretation of the coefficient for the control annotation is the same as for any other annotation. |
Thanks! After I review the wiki carefully, I guess I might figure it out. The control annotate all genes-related SNPs, which could be considered as an extra adjustment on the basis of baseline model for the comparability between different cells or tissues. It functions in the same way as baseline model. Is that right? |
Hi, I have a related question. I understand that the interpretation of the coefficient for the control annotation is the same as for any other annotation in a statistical sense. However, I’m still struggling to understand the biological reasoning behind using all genes as a control annotation (Finucane et al. Nat Genet. 2018). Since we are interested in specifically expressed genes in different tissues, I understand the need to condition on LD, allele frequency, etc., but why do we condition on all genes? Thanks! |
Hi!
I am currently using the --ref-ld-chr-cts option to perform S-LDSC analysis, and I have a few questions:
Thank you very much for your time and assistance. I appreciate the incredible tool you’ve developed, and any guidance on these questions would be greatly appreciated.
Best regards,
Sun Yingkai
The text was updated successfully, but these errors were encountered: