Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LinDA for longitudinal microbiome analysis #7

Open
EOMAK91 opened this issue Feb 2, 2023 · 15 comments
Open

LinDA for longitudinal microbiome analysis #7

EOMAK91 opened this issue Feb 2, 2023 · 15 comments

Comments

@EOMAK91
Copy link

EOMAK91 commented Feb 2, 2023

Hi,

I am currently using MassLin2 to analyze my longitudinal microbiome dataset; however, I saw your recent paper describing LinDA and its strengths in comparison to MassLin2. Therefore, I would like to compare my current results to those I would get from LinDA. However, after browsing through your page, I do not see an example of how to use LinDA to analyze longitudinal microbiome data. Could you point me in the direction of a tutorial or webpage that demonstrates how to adapt LinDA for longitudinal analysis?

Thanks in advance.

@zhouhj1994
Copy link
Owner

For longitudinal data, one may use the mixed-effect model to incorporate subject as a random effect, since one subject always has multiple observations causing correlations among data. The use of mixed-effect models in LinDA is the same as in linear regression (lm), such as 'formula = '~Smoke+Sex+(1|SubjectID)' in the example in the readme page.

@PeterCx
Copy link

PeterCx commented Mar 15, 2023

Dear @zhouhj1994

I am using LinDA for longitudinal microbiome analysis. I have questions as a few things are not clear to me. This is also related to another issue #5

I have 3 groups (Control, Placebo, Treatment) and 3 timepoints (0, 22, 52). Thus in total there are 18 pairwise comparisons for which I want to know what taxa are differentially abundant.

I have used the following code:

linda.obj <- linda(Species_Filt, Metadata, formula = '~Treatment*Time+(1|Sample)')

alp1 <- linda.obj$output$TreatmentControl$log2FoldChange
se1 <- linda.obj$output$TreatmentControl$lfcSE
df1 <- linda.obj$output$TreatmentControl$df

alp2 <- linda.obj$output$Time22$log2FoldChange
se2 <- linda.obj$output$Time22$lfcSE
df2 <- linda.obj$output$Time22$df

alp3 <- linda.obj$output$`TreatmentControl:Time22`$log2FoldChange
se3 <- linda.obj$output$`TreatmentControl:Time22`$lfcSE
df3 <- linda.obj$output$`TreatmentControl:Time22`$df

cov12 <- linda.obj$covariance$TreatmentControl$Time22
cov13 <- linda.obj$covariance$TreatmentControl$`TreatmentControl:Time22`
cov23 <- linda.obj$covariance$Time22$`TreatmentControl:Time22`

stat <- (alp1+alp3) / (sqrt(se1^2+se3^2+2*cov13))
pvalue <- 2 * pt(-abs(stat), (df1+df3)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]

This code should allow me to see if there is a difference between Control and Treatment at Day 22. linda.object$covariance is NULL so I cant complete the calculation.

However, even with this method its not possible to get all pairwise comparisons. For example - Treatment at Day 0 vs Treatment at Day 22 or to check if there is a difference between Control and Placebo at any time.

How should I proceed to get all pairwise comparisons?

Your help is greatly appreciated. Many thanks

Kind regards,

P

@zhouhj1994
Copy link
Owner

Thanks for the use of our package.

I am not sure whether your codes will work. If the "Treatment" variable has three levels: Control, Placebo, Treatment, then "Control" will be treated as the reference level (alphabetizing rule), so there shouldn't be such results as "linda.obj$output$TreatmentControl" but "linda.obj$output$TreatmentPlacebo" and "linda.obj$output$TreatmentTreatment" instead.

To get all pairwise comparisons, for example,

  1. Treatment at Day 0 vs Treatment at Day 22: First please make sure that your "Time" variable is categorical (i.e., factor type in R syntax), and note that "0" will be treated as the reference level. Then the following codes would do the task.
alp1 <- linda.obj$output$Time22$log2FoldChange
se1 <- linda.obj$output$Time22$lfcSE
df1 <- linda.obj$output$Time22$df

alp2 <- linda.obj$output$TreatmentTreatment:Time22$log2FoldChange
se2 <- linda.obj$output$TreatmentTreatment:Time22$lfcSE
df2 <- linda.obj$output$TreatmentTreatment:Time22$df

cov12 <- linda.obj$covariance$Time22$TreatmentTreatment:Time22

stat <- (alp1+alp2) / (sqrt(se1^2+se2^2+2*cov12))
pvalue <- 2 * pt(-abs(stat), (df1+df2)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]
  1. Control and Placebo at any time: If you mean Control and Placebo at the same day, then the differential analysis would be
# Time 0
alp1 <- linda.obj$output$TreatmentPlacebo$log2FoldChange
se1 <- linda.obj$output$TreatmentPlacebo$lfcSE
df1 <- linda.obj$output$TreatmentPlacebo$df

stat <- alp1/se1
pvalue <- 2 * pt(-abs(stat), df1)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]

# Time 22
alp2 <- linda.obj$output$TreatmentPlacebo:Time22$log2FoldChange
se2 <- linda.obj$output$TreatmentPlacebo:Time22$lfcSE
df2 <- linda.obj$output$TreatmentPlacebo:Time22$df

cov12 <- linda.obj$covariance$TreatmentPlacebo$TreatmentPlacebo:Time22

stat <- (alp1+alp2) / (sqrt(se1^2+se2^2+2*cov12))
pvalue <- 2 * pt(-abs(stat), (df1+df2)/2)
padj <- p.adjust(pvalue, method = 'BH')
rownames(linda.obj$output[[1]])[padj <= 0.1]

# Time 52: similar to Time 22
  1. For other comparisons, the analyses would be similar, just be clear that there would be the following 8 outputs

    TreatmentPlacebo, TreatmentTreatment, Time22, Time52, TreatmentPlacebo:Time22, TreatmentPlacebo:Time52,
    TreatmentTreatment:Time22, TreatmentTreatment:Time52

    and

    TreatmentPlacebo (can be seen as the baseline Placebo effect, i.e., at time 0), TreatmentTreatment (can be seen as the baseline Treatment effect, i.e., at time 0), Time22 (can be seen as the baseline time 22 effect, i.e., in control group) , Time52 (can be seen as the baseline time 52 effect, i.e., in control group)

@PeterCx
Copy link

PeterCx commented Mar 16, 2023

Thanks for the response. This is very very helpful. However, I cannot find the covariances in the output.

cov12 <- linda.obj$covariance returns NULL

There is no output called covariance in linda.obj. How are you calculating this or what am I missing?

Thanks very much. P

@zhouhj1994
Copy link
Owner

Sorry for the late reply.

Are you using the latest version of the package? The old version didn't include the covariance in the output.

Have you tried this example? https://github.com/zhouhj1994/LinDA Can you see the covariances in the output?

Thanks.

@PeterCx
Copy link

PeterCx commented Mar 27, 2023

No problem.

Yes I was using wrong version. I was using the linda function implemented in the microbiomeStat package. I have now installed the LinDA package correctly via devtools and its al working perfectly. Thanks again for all your help.

P

@zhouhj1994
Copy link
Owner

Great.

@Ivy-ops
Copy link

Ivy-ops commented Oct 9, 2023

Thank you for the insightful discussion. I have a question related to this. It would be highly appreciated if you could help me with the explanation of the LinDA results. @zhouhj1994
For example, in @EOMAK91 example, Time has 0,22 and 52. Should we consider the "Time" variable as a continuous variable or as a categorical factor? How does the interpretation differ when "Time" is viewed as continuous versus when it's treated as a factor?

@zhouhj1994
Copy link
Owner

If you want to understanding the effect of time, in other words, how is the abundance changed by one unit change of time, you may treat it as continuous.
If you are interest in the difference between time 0 and time22/time 52,you may treat it as categorical.

@Ivy-ops
Copy link

Ivy-ops commented Oct 16, 2023

Hi @zhouhj1994, thank you for the answer.

I've executed the following command: fit <- linda(feature.dat=feature.dat, meta.dat=meta.dat, formula = "~ treatment * time"), where "time" includes the values 0, 22, and 52, and "treatment" includes two levels, "yes" and "no." Could you help me check whether my use of linda output is correct? Much appreciated!

If I consider "time" as a continuous variable:
Does "fit$output$time$log2FoldChange" imply that a one-unit change in time results in an x log2FoldChange in each feature?
Does "fit$output$treatmentyes$log2FoldChange" mean that, in comparison to the "no" treatment, the "yes" treatment yields an x log2FoldChange in feature.dat?

If I consider "time" as a categorical variable:
Does "fit$output$time$time22$log2FoldChange" signify that time22 exhibits an x log2FoldChange in each feature compared to time 0?
Does "fit$output$treatmentyes$log2FoldChange" mean that, relative to the "no" treatment, the "yes" treatment results in an x log2FoldChange in feature.dat?

@zhouhj1994
Copy link
Owner

Your interpretation is not bad. But you may want to say "...while keeping other variables fixed" or something like that and be careful of the interaction term.

Your model is "treatment + time + treatment * time"

  • Consider "time" as a continuous variable:

The effect of time: for treatment "no", the effect is "fit$output$time$log2FoldChange", while for treatment "yes", the effect is "fit$output$time$log2FoldChange" + "fit$output$treatmentyes:time$log2FoldChange"

The effect of treatment: while fixing time, the treatment "yes" yields "fit$output$treatmentyes$log2FoldChange" + "fit$output$treatmentyes:time$log2FoldChange" change in comparison to the treatment "no".

  • Consider "time" as a categorical variable:

The effect of time22: for treatment "no", the effect is "fit$output$time22$log2FoldChange", while for treatment "yes", the effect is "fit$output$time22$log2FoldChange" + "fit$output$treatmentyes:time22$log2FoldChange"

The effect of treatment: for fixed time22, the treatment "yes" yields "fit$output$treatmentyes$log2FoldChange" + "fit$output$treatmentyes:time22$log2FoldChange" change in comparison to the treatment "no".

@Ivy-ops
Copy link

Ivy-ops commented Oct 23, 2023

Thank you for the correction and the details! @zhouhj1994

@PengfanZhang
Copy link

Hi, I have a naive question. What does fit$output$time$log2FoldChange mean when time is a continuous variable? Which two groups are used to calculate the fold change? Thanks!

@zhouhj1994
Copy link
Owner

When time is a continuous variable, fit$output$time$log2FoldChange means the effect of time on the response variable (if there are no interactions considered). We might say, a one unit increase in time corresponds to change of fit$output$time$log2FoldChange in response.

@zhouhj1994
Copy link
Owner

Here, the response variable is the log2-transformed microbiome abundance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants