Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cytof_cluster with Rphenograph crashes R #15

Closed
vsmuir opened this issue Nov 8, 2017 · 5 comments
Closed

cytof_cluster with Rphenograph crashes R #15

vsmuir opened this issue Nov 8, 2017 · 5 comments

Comments

@vsmuir
Copy link

vsmuir commented Nov 8, 2017

Using R 3.4.2, the bioconductor 3.6, and cytofkit 1.10.0, I have not been able to run Rphenograph clustering on an expression matrix without R crashing.

I'm running some preprocessing (transformation & normalization) outside of cytofkit, so it's possible I've completely missed some required part of the matrix. None of the other clustering methods have failed, however.

I've attached two dummy matrices which fail to cluster with Rphenograph.

sample_expr1 = read.csv("sample_expr_matrix1.csv")
sample_expr2 = read.csv("sample_expr_matrix2.csv")
selected_markers <- c("CD45ra", "CCR7", "CD38", "CCR4", "CCR6", "CXCR3")

cluster_out1 <- cytof_cluster(xdata = as.matrix(sample_expr1[,selected_markers]),
method = "Rphenograph")
cluster_out2 <- cytof_cluster(xdata = as.matrix(sample_expr2),
method = "Rphenograph")

sample_expr_matrix1.txt
sample_expr_matrix2.txt

@jinmiaochen
Copy link

jinmiaochen commented Nov 8, 2017 via email

@SamGG
Copy link

SamGG commented Nov 8, 2017

Hi,
I tried it also on my Windows computer, R 3.3.0. RStudio crashes, so no error message is noticed.
I forked the current devel version in order to run it with R 3.3.0 and installed it (all compilations succeeded). Rstudio crashes. I rerun the cluster command under the Rterm (ie no Rstudio). It crashes, and still no informative message. I would think about a C problem.
ClusterX is still working ;-)
HTH

The code (importing files from github corrected)

sample_expr1 = read.table("sample_expr_matrix1.txt", sep = "\t")
sample_expr2 = read.table("sample_expr_matrix2.txt", sep = "\t")
sample_expr1 = as.matrix(sample_expr1)
sample_expr2 = as.matrix(sample_expr2)

selected_markers <- c("CD45ra", "CCR7", "CD38", "CCR4", "CCR6", "CXCR3")
intersect(colnames(sample_expr1), selected_markers)
intersect(colnames(sample_expr2), selected_markers)

library(cytofkit)

cluster_out1 <- cytof_cluster(ydata = sample_expr1[,c("tsne_1", "tsne_2")], xdata = sample_expr1[,selected_markers], method = "ClusterX")
plot(sample_expr1[,c("tsne_1", "tsne_2")], pch = 20, col = (cluster_out1 %% 8+1))

cluster_out1 <- cytof_cluster(xdata = sample_expr1[,selected_markers], method = "Rphenograph")

The run

C:\Data\active\tests\test-rphenograph>"C:\Program Files\R\R-3.3.3\bin\R.exe"
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> library(cytofkit)
Loading required package: ggplot2
Loading required package: plyr
> cluster_out1 <- cytof_cluster(xdata = sample_expr1[,selected_markers], metho$
  Running PhenoGraph...Run Rphenograph starts:
  -Input data of 7615 rows and 6 columns
  -k is set to 30
  Finding nearest neighbors...
C:\Data\active\tests\test-rphenograph

@MattMyint
Copy link
Contributor

MattMyint commented Nov 8, 2017

Thanks @SamGG, I was just looking into the issue and it's likely to be a C issue as you suggest.

For what its worth, switching the treetype from bd to kd has made the function run without crashing, similar to issue #12, with the difference being that the first sample matrix attached has no duplicate rows.

I'll see if this change affects results downstream before implementing it

EDIT: Additional info. It seems the first matrix has 0 duplicates out of 7615 while the second has 197 duplicate rows out of 7699. The second matrix works fine if I remove the duplicates, but that leaves the issue of the first matrix. Just to be clear, this is using bd treetype. Using kd works fine for both.

Also, using either bd or kd seems to have no effect on clustering results, so the solution should be okay

@SamGG
Copy link

SamGG commented Nov 8, 2017

Thanks for your feedback... and your memory! It is indeed the same trouble. Adding some stochastic magic solves the issue. Best.

xdata = sample_expr1[,selected_markers]
table(duplicated(xdata))

xdata = xdata + matrix(runif(n = prod(dim(xdata)), min = -1e-7, max = 1e-7), nrow = nrow(xdata))
table(duplicated(xdata))

cluster_out1 <- cytof_cluster(xdata = xdata, method = "Rphenograph")
# It runs!!!

@vsmuir
Copy link
Author

vsmuir commented Nov 8, 2017

The magic works perfectly to prevent the crash.

That being said, I've found that adding the small level of noise suggested above leads to reproducibility issues, in my hands. When I run the same series of FCS files in the same order from the same source, I get consistent results. When any one of these is altered, however, I end up with surprisingly different cluster assignments. (In one case, I imported 4 FCS files, structured them either in a flowSet or as a concatenated expression matrix, with events stored in the same order and an extra column containing sample ID. I looped over either the flowFrame or the sample ID to run Rphenograph in cytofkit with added noise. 3/19 clusters stayed the same, but it was more typical for only ~ 60% of a cluster to be preserved between the two analyses. The worst-performing cluster only showed 27% shared cluster assignment between the two runs.) One might expect this, since the noise is random (even with a set seed). Just wanted to provide a heads up.

I have not run into similar problems using the kd treetype. In fact, even after shuffling the order of the files in the loop, I continue to get consistent cluster assignment.

Thanks for your help!!

@vsmuir vsmuir closed this as completed Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants