Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prop._h2 is negative #438

Open
dqq0404 opened this issue Jun 21, 2024 · 15 comments
Open

Prop._h2 is negative #438

dqq0404 opened this issue Jun 21, 2024 · 15 comments

Comments

@dqq0404
Copy link

dqq0404 commented Jun 21, 2024

Hi,
When I did the prtitioned heritability with 1000G_Phase3_baselineLD_v2.2_ldscores.tgz, the Prop._h2 was negative and it was very significant. Is this a bug?

@aksarkar
Copy link

@dqq0404 It would be helpful to have the complete output of ldsc. Without it, one cannot say much.

Did you run with --overlap-annot? This is required for the baseline LD model even though this is not explicitly written in the documentation anywhere, since the baseline annotations overlap.

@dqq0404
Copy link
Author

dqq0404 commented Jun 23, 2024

Hi,
This is my input:
./ldsc.py
--h2 ...sumstats.gz
--ref-ld-chr baseline_v2.2/baselineLD.
--out ...baseline
--overlap-annot
--frqfile-chr .../1000G_EUR_Phase3_plink/1000G.EUR.QC.
--w-ld-chr 1000G_Phase3_weights_hm3_no_MHC/1000G_Phase3_weights_hm3_no_MHC/weights.hm3_noMHC.

and this is my output that seemed wrong:
Category Prop._SNPs Prop._h2 Prop._h2_std_error Enrichment Enrichment_std_error Enrichment_p
MAF_Adj_Predicted_Allele_AgeL2_0 3.39243741662e-06 -0.387852505281 0.0452457387571 -114328.566057 13337.2360933 8.9865840571286534e-14
MAF_Adj_LLD_AFRL2_0 0.00279647423847 -0.291467372182 0.033175498992 -104.226732423 11.8633308098 8.7228613575755434e-14
MAF_Adj_ASMCL2_0 -2.34852700521e-14 -0.470331648975 0.03990867818 2.00266655623e+13 -1.69930676085e+12 4.5256915227493249e-24

@aksarkar
Copy link

@dqq0404 The "proportion of h^2" explained by a continuous annotation is not a sensible quantity by definition.

For continuous annotations, you can only draw conclusions from the estimated coefficient.

@dqq0404
Copy link
Author

dqq0404 commented Jun 24, 2024

Thanks for your reply. Can I ask how to calculate the estimated coefficient?

@dqq0404
Copy link
Author

dqq0404 commented Jun 24, 2024

Hi,
I found there was a --print-coefficients parameter to get the coefficient.
This question may be simple,but I still want to know what does it mean when the coefficient is positive and negative?
Could you please explain it?
Thanks!!

@aksarkar
Copy link

The interpretation of the coefficient is the amount that the per-SNP heritability increases when the annotation increases by one standard deviation.

Refer to Gazal et al. 2017 for more details.

@dqq0404
Copy link
Author

dqq0404 commented Jun 25, 2024

Hi,
After I saw this paper, I found that the paper used Tau_star.coefficient to compare across annotations and
across traits instead of Tau.coefficient. And I found there was a formula to calculate Tau_star.coefficient:
#270 (comment)
But I do not understand what the meaning of alphabet in this formula.How to calculate Tau_star.coefficient using following information?
Prop._SNPs Prop._h2 Prop._h2_std_error Enrichment Enrichment_std_error Enrichment_p Coefficient Coefficient_std_error Coefficient_z-score

@aksarkar
Copy link

As stated in the methods section of Gazal et al. 2017:

M_{h_g^2} is the number of SNPs that were analyzed. You can get this from the printed output of ldsc.

h_g^2 is the estimated heritability. You can also get this from the printed output of ldsc.

sd_c is the standard deviation of the annotation. You need to read the .annot file and compute the standard deviation of the relevant column.

\hat{tau} is the column Coefficient in the output.

@dqq0404
Copy link
Author

dqq0404 commented Jun 26, 2024

If I understand it,

  1. I should combine v2.2 .annot file for 22 chromosomes and calculate the std for each category.This is using the all snps (about 10 millions)that in the 22 .annot files.
    or
  2. I should combine v2.2 .annot file and weights.hm3_noMHC file for 22 chromosomes respectively and merge with my summary statistics. Eventually, I probably get about 1 million snps and I use these snps to calculate the std for each category.

Which one should I choose?

@aksarkar
Copy link

You should choose (2). The effect size to be standardized only describes the SNPs that were used in the regression, that is, those SNPs present in both --w-ld and --ref-ld.

@dqq0404
Copy link
Author

dqq0404 commented Jun 27, 2024

Thank you for your patient answer that solve my confusion.
I have another two question:

1.Should I use the tau coefficient_z-score to test the significance of Tau_star.coefficient instead of Enrichment_p? If I use former, the result is different with latter, how to interpret this?

2.How to calculate the chi^2 of a snp when doing partitioned heritability? Because I see snps are removed when the chi^2 > 80.

@aksarkar
Copy link

  1. Correct, you need to use the z-score of the coefficient to draw statistical conclusions. The reason they are different is that heritability enrichment does not account for the contribution of other annotations, whereas the coefficient does.

  2. The chi^2 statistic is the square of the z-score. https://en.wikipedia.org/wiki/Chi-squared_distribution#Definitions

@dqq0404
Copy link
Author

dqq0404 commented Jun 28, 2024

If I use the tau coefficient_z-score to test the significance of Tau_star.coefficient, should I also use the Bonferroni threshold to control false positive signals (such as 0.05/96)?

@1667857557
Copy link

As stated in the methods section of Gazal et al. 2017:

M_{h_g^2} is the number of SNPs that were analyzed. You can get this from the printed output of ldsc.

h_g^2 is the estimated heritability. You can also get this from the printed output of ldsc.

sd_c is the standard deviation of the annotation. You need to read the .annot file and compute the standard deviation of the relevant column.

\hat{tau} is the column Coefficient in the output.

Thank you for the information. I have some questions about generating the sd_c. Could you please provide details on how to "compute the standard deviation of the relevant column" using the baseline model or the relevant code? Thanks in advance!

@aksarkar
Copy link

@dqq0404 Yes, you still need to correct for multiple testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants