Skip to content

Conversation

@ntalluri
Copy link
Collaborator

No description provided.

@ntalluri
Copy link
Collaborator Author

ntalluri commented Jun 21, 2023

my lastest commit didn't run the automated tests bc the config.yaml has a merge conflict (i was also trying to see if the error I'm getting on my end for summary.py was occuring on these tests)

I looked, and it was just a space issue, but I wasn't sure if I should just go ahead and fix it

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my final testing, with data0 and the cosine metric I received this error:

RuleException:
ValueError in line 265 of C:\Users\agitter\Desktop\madison\collaborators\Ritz\spras\Snakefile:
The condensed distance matrix must contain only finite values.
  File "C:\Users\agitter\Desktop\madison\collaborators\Ritz\spras\Snakefile", line 265, in __rule_ml_analysis
  File "C:\Users\agitter\Desktop\madison\collaborators\Ritz\spras\src\analysis\ml.py", line 219, in hac_vertical
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 1258, in clustermap
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 1129, in plot
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 974, in plot_dendrograms
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 687, in dendrogram
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 495, in __init__
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 562, in calculated_linkage
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\seaborn\matrix.py", line 530, in _calculate_linkage_scipy
  File "C:\Users\agitter\.conda\envs\spras\lib\site-packages\scipy\cluster\hierarchy.py", line 1065, in linkage
  File "C:\Users\agitter\.conda\envs\spras\lib\concurrent\futures\thread.py", line 57, in run
Removing output files of failed job ml_analysis since they might be corrupted:
output/data0-pca.png, output/data0-pca-components.txt, output/data0-pca-coordinates.txt

I believe this is caused by all the identical graphs in this dataset. We can leave the behavior for now but may want a more informative error message in the future.

I renamed the PCA output file names. data1-pca-components.txt describes the variance explained, not the principal components.

I tested running with multiple cores, and somehow the wrong figure is sometimes saved to data1-pca.png. Once it was horizontal hierarchical clustering plot. Once it was the data0 PCA plot. I'm not sure how that happens when the rules run in parallel, but we'll need to debug that. It's fine when I run with 1 core. @ntalluri can you reproduce that behavior? Here's an example where it looks like the same figure object was being manipulated in both parallel rules.
data1-pca.png:
data1-pca.png

@agitter
Copy link
Collaborator

agitter commented Jul 6, 2023

Great work with these final updates. This is ready to merge.

I noticed the algorithm name parsing didn't work in the EGFR dataset because the dataset is called tps-egfr. I fixed that in f4f8ad1, but we could have future problems if an algorithm name ever contains -. Before the correction, the EGFR plots looked like this.
tps-egfr-hac-vertical

@agitter agitter merged commit 7c89d4e into Reed-CompBio:master Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants