Copula model underfitting #6

RoseYuan · 2023-03-24T18:47:35Z

Hello there,

I'm trying to use scDesign3 to simulate single-cell ATAC-seq data, and I like your cool results shown in FigureS6 and S7 of the paper. But so far I cannot repeat that on another dataset, because the copula model is always underfitting. I attached my code and some results here. Do you have any idea why the model is not working?

My code:
simu <- scdesign3( sce = sce, assay_use = "counts", celltype = "cell_type", pseudotime = NULL, spatial = NULL, other_covariates = NULL, mu_formula = "cell_type", sigma_formula = "1", family_use = "zip", n_cores = 2, usebam = FALSE, corr_formula = "cell_type", copula = "gaussian", DT = TRUE, pseudo_obs = FALSE, return_model = TRUE, nonzerovar = FALSE )

My results:

The BIC and AIC for the copula model are Inf
The marginal BIC and AIC is also very large (with aic.marginal=4431524, bic.marginal=4819040)
The similarity of peak-peak correlation matrices between the real (training) data and the synthetic data is low (see below attachment).

selected.pdf

My training data:
They're two cell groups from a public sci-ATAC-seq atlas, where I select about 7000 peaks and 1000 cells.

Could you let me know for simulating ATAC-seq data, how many peaks you usually use/ would recommend to use?

The text was updated successfully, but these errors were encountered:

JSB-UCLA · 2023-03-24T20:01:12Z

Hi Siyuan,
Thank you for your interest in our work! The correlation does look unsatisfying. In our study, we use 1133 peaks/3836 peaks for ATAC/SCIATAC, respectively (see Table S2). Ideally, your feature should be smaller than the cell number (due to the curse of dimensionality for correlation estimation), but we can not always guarantee this.

For your question:

Since you use Gaussian copula and your feature number is larger than cell number, AIC/BIC will always be Inf. Using vine coupla can give you AIC/BIC but it will also be very slow for > 1000 features.
Marginal AIC/BIC seems Ok to me.
Yes, it looks weird. Some things you may want to try: (a) Start from a smaller set of peaks and check the result (e.g., < 1000 peaks); (b) Try NB instead of ZIP.

Would you mind sharing your sce data with me (I guess it is Ok since it comes from public data)? I can do a very quick check. My email is dongyuansong@ucla.edu

Best,
Dongyuan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copula model underfitting #6

Copula model underfitting #6

RoseYuan commented Mar 24, 2023 •

edited

Loading

JSB-UCLA commented Mar 24, 2023

Copula model underfitting #6

Copula model underfitting #6

Comments

RoseYuan commented Mar 24, 2023 • edited Loading

JSB-UCLA commented Mar 24, 2023

RoseYuan commented Mar 24, 2023 •

edited

Loading