You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use scDesign3 to simulate single-cell ATAC-seq data, and I like your cool results shown in FigureS6 and S7 of the paper. But so far I cannot repeat that on another dataset, because the copula model is always underfitting. I attached my code and some results here. Do you have any idea why the model is not working?
Hi Siyuan,
Thank you for your interest in our work! The correlation does look unsatisfying. In our study, we use 1133 peaks/3836 peaks for ATAC/SCIATAC, respectively (see Table S2). Ideally, your feature should be smaller than the cell number (due to the curse of dimensionality for correlation estimation), but we can not always guarantee this.
For your question:
Since you use Gaussian copula and your feature number is larger than cell number, AIC/BIC will always be Inf. Using vine coupla can give you AIC/BIC but it will also be very slow for > 1000 features.
Marginal AIC/BIC seems Ok to me.
Yes, it looks weird. Some things you may want to try: (a) Start from a smaller set of peaks and check the result (e.g., < 1000 peaks); (b) Try NB instead of ZIP.
Would you mind sharing your sce data with me (I guess it is Ok since it comes from public data)? I can do a very quick check. My email is dongyuansong@ucla.edu
Hello there,
I'm trying to use scDesign3 to simulate single-cell ATAC-seq data, and I like your cool results shown in FigureS6 and S7 of the paper. But so far I cannot repeat that on another dataset, because the copula model is always underfitting. I attached my code and some results here. Do you have any idea why the model is not working?
My code:
simu <- scdesign3( sce = sce, assay_use = "counts", celltype = "cell_type", pseudotime = NULL, spatial = NULL, other_covariates = NULL, mu_formula = "cell_type", sigma_formula = "1", family_use = "zip", n_cores = 2, usebam = FALSE, corr_formula = "cell_type", copula = "gaussian", DT = TRUE, pseudo_obs = FALSE, return_model = TRUE, nonzerovar = FALSE )
My results:
selected.pdf
My training data:
They're two cell groups from a public sci-ATAC-seq atlas, where I select about 7000 peaks and 1000 cells.
Could you let me know for simulating ATAC-seq data, how many peaks you usually use/ would recommend to use?
The text was updated successfully, but these errors were encountered: