You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Clustergrammer2 can visualize matrices with ~10 million matrix cells, but slows down when there are too many columns or rows (>20,000). CyTOF data is very 'wide' in that it has ~50 dimensions but we would ideally like to visualize 50-100K columns if possible.
We are able to interactively visualize matrices with 50-100K columns using Clustergrammer2 since the visualization and interaction are handled by the GPU. However, we run into issues with interacting with the dendrogram using JavaScript on the CPU and with running the hierarchical clustering on the back-end:
hierarchical clustering takes too long (Python back-end)
front-end interactions with the linkage matrix get too slow (JavaScript dendrogram interactions)
In order to speed up the hierarchical clustering step we usually un a first round of K-means clustering and then hierarchically cluster the results (we could similarly slice the hierarchical linkage tree such that we reduce the resolution of the hierarchical clustering results).
Describe the solution you'd like
If we want to visualize the data for peace of mind and are largely satisfied identifying clusters at a resolution of ~20 data points per cluster then we could try the following:
hierarchically cluster 100K data points (will take a long time) and then trim the linkage matrix such that the maximum number of clusters allowed is ~5,000 (despite having ~100K data points)
Run K-means (e.g. 5K clusters) before hierarchical clustering, but render the original data (100K) and only allow linkage matrix interactions with the K-means clustering results.
In either case we would have to keep a dictionary of which samples belong to which K-means cluster or truncated dendrogram cluster. We will also have to update the manual category code to handle this. These do not seem like very difficult problems to overcome.
Longer Term Possible Solutions
Other longer term solutions might include
move more of the logic into the front-end GPU if possible (which is difficult becuase you need to hack WebGL into doing useful calculations https://gpu.rocks/#/) - this will be difficult to write linkage matrix traversing code in WebGL shaders and doesn't seem like a task that is parallelizable
move more of the logic to the backend (via widget JS-PY communication) - this solution may not help much (assuming Python is not much faster than JavaScript) and will only be applicable for instances where we have a running Python kernel
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Clustergrammer2 can visualize matrices with ~10 million matrix cells, but slows down when there are too many columns or rows (>20,000). CyTOF data is very 'wide' in that it has ~50 dimensions but we would ideally like to visualize 50-100K columns if possible.
We are able to interactively visualize matrices with 50-100K columns using Clustergrammer2 since the visualization and interaction are handled by the GPU. However, we run into issues with interacting with the dendrogram using JavaScript on the CPU and with running the hierarchical clustering on the back-end:
In order to speed up the hierarchical clustering step we usually un a first round of K-means clustering and then hierarchically cluster the results (we could similarly slice the hierarchical linkage tree such that we reduce the resolution of the hierarchical clustering results).
Describe the solution you'd like
If we want to visualize the data for peace of mind and are largely satisfied identifying clusters at a resolution of ~20 data points per cluster then we could try the following:
In either case we would have to keep a dictionary of which samples belong to which K-means cluster or truncated dendrogram cluster. We will also have to update the manual category code to handle this. These do not seem like very difficult problems to overcome.
Longer Term Possible Solutions
Other longer term solutions might include
The text was updated successfully, but these errors were encountered: