You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've used the latest github version (Dec 14 2020) and my goal is to run ICGS2 clustering. I've created two versions of input counts - one is plain tsv file (called 'tsv' below) and the other a directory with matrix.mtx.gz, features.tsv.gz and barcodes.tsv.gz (called 'mtx') to emulate the 10X output (I don't know how to make a compatible h5 file). I also had to replace ':' with '_' in gene symbols/names as it seemed to cause some issues. The counts (and gene names) are identical in both cases, but the results are not. In both cases, the ICGS seems to have finished - the last line in the log files is ICGS run complete... halted prior to full differential comparison analysis.
I had 4884 cells. For the tsv counts I have 4585 lines (cells) in ICGS-NMF/FinalGroups.txt while for mtx there are 4281.
I have these questions:
can I trust the clustering (ICGS2) results despite the multiple logged errors (please see below)? It seems that most of them are related to gene biotype annotations which are missing ...
why some cells are missing from ICGS-NMF/FinalGroups.txt files?
why do the two versions give different results? Is there some random component here?
Thank you for your help!
Here are the errors:
try({hopg<-hopach(data,dmat=distmatg,ord="own")})
Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In collap(data, level, d, dmat, newmed) :
Not enough medoids to use newmed='medsil' in collap() -
using newmed='nn' instead
Traceback (most recent call last):
File "/path/to/altanalyze-master_github12.14.2020/visualization_scripts/clustering.py", line 261, in heatmap
newFilename, Z1, Z2 = R_interface.remoteHopach(inputFilename,cluster_method,metric_gene,metric_array)
File "/path/to/altanalyze-master_github12.14.2020/R_interface.py", line 106, in remoteHopach
z.Hopach(cluster_method,metric_gene,force_gene,metric_array,force_array)
File "/path/to/altanalyze-master_github12.14.2020/R_interface.py", line 626, in Hopach
if 'clustering' in hopach_run:
UnboundLocalError: local variable 'hopach_run' referenced before assignment
hopach failed... continue with an alternative method
Traceback (most recent call last):
File "/path/to/altanalyze-master_github12.14.2020/RNASeq.py", line 4122, in correlateClusteredGenesParameters
except Exception: TFs = importGeneSets('BioTypes',filterType='transcription regulator',geneAnnotations=gene_to_symbol_db)
File "/path/to/altanalyze-master_github12.14.2020/RNASeq.py", line 2826, in importGeneSets
for line in open(fn,'rU').xreadlines():
IOError: [Errno 2] No such file or directory: '/path/to/altanalyze-master_github12.14.2020/AltDatabase/EnsMart72/goelite/Dr/gene-mapp/Ensembl-BioTypes.txt'
Traceback (most recent call last):
File "/path/to/altanalyze-master_github12.14.2020/GO_Elite.py", line 1357, in runGOElite
try:go_to_mod_genes, mapp_to_mod_genes, timediff, mappfinder_input, resource = mappfinder.generateMAPPFinderScores(species,species_code,source_data,mod,system_codes,permute,resources,file_dirs,root,Multi=mlp)
File "/path/to/altanalyze-master_github12.14.2020/mappfinder.py", line 462, in generateMAPPFinderScores
if PoolVar: q.put([print_out]); return None
AttributeError: 'NoneType' object has no attribute 'put'
gene associations assigned
Traceback (most recent call last):
File "/path/to/altanalyze-master_github12.14.2020/stats_scripts/ICGS_NMF.py", line 1295, in CompleteICGSWorkflow
annotatedGroupsFile = RNASeq.predictCellTypesFromClusters(finalgrpfile, goelite_path)
File "/path/to/altanalyze-master_github12.14.2020/RNASeq.py", line 5583, in predictCellTypesFromClusters
for line in open(goelite_path,'rU').xreadlines():
IOError: [Errno 2] No such file or directory: '/path/to/test1a/NMF-SVM/SVMOutputs/GO-Elite/clustering/MarkerFinder-subsampled-ordered/GO-Elite_results/pruned-results_z-score_elite.txt'
Unable to export annotated groups file with predicted cell type names.
Parent directory not found locally for ['/DataPlots/', '/DataPlots/exp.56hpf-LTA-counts-ICGS-UMAP_scores.txt']
The text was updated successfully, but these errors were encountered:
My apologies for the long delay in the response. When running AltAnalyze on the .mtx file with the option "--dataFormat counts", it may result in unexpected issues as the program should likely try to adjust the an already normalized/log2 file (will scale it again and take the log2 of the log2 values). I think this is likely the issue encountered. We will see what we can do on our end to detect such an issue and prevent it (we can check to make sure the data is not already log2 converted).
I should also note, that for Danio rerio, we also made some relatively recent (I believe in the last several months) changes that prevent ID errors with Ensembl IDs for this species (which can have chromsome IDs that are incompatible with the supporting gene ID naming conventions).
To address your other questions:
can I trust the clustering (ICGS2) results despite the multiple logged errors (please see below)? It seems that most of them are related to gene biotype annotations which are missing ...
-- Yes, some of these errors are due to the species with non-supported files for cell-type prediction, although we will add support in future versions for Dr. The HOPACH error can be resolved by ensuring HOPACH is installed in the version of R you have available from the command-line when you type "R". AltAnalyze used to be able to install automatically on most operating systems, but increased gatekeeper functions, depending on the admin profile, can prevent the software from installing these.
why some cells are missing from ICGS-NMF/FinalGroups.txt files?
-- This is because ICGS2 identifies transcriptionally robust clusters by classifying all cells (SVM) to the surviving NMF cluster centroids (sufficiently unique marker genes in the cluster). If cells fail to sufficiently align, they are excluded after version 2.1.4.
why do the two versions give different results? Is there some random component here?
-- Without the "--dataFormat counts" option, in principle, the results should be identical.
I've used the latest github version (Dec 14 2020) and my goal is to run ICGS2 clustering. I've created two versions of input counts - one is plain tsv file (called 'tsv' below) and the other a directory with matrix.mtx.gz, features.tsv.gz and barcodes.tsv.gz (called 'mtx') to emulate the 10X output (I don't know how to make a compatible h5 file). I also had to replace ':' with '_' in gene symbols/names as it seemed to cause some issues. The counts (and gene names) are identical in both cases, but the results are not. In both cases, the ICGS seems to have finished - the last line in the log files is
ICGS run complete... halted prior to full differential comparison analysis
.These are the commands I've issued:
tsv:
mtx:
I had 4884 cells. For the
tsv
counts I have 4585 lines (cells) inICGS-NMF/FinalGroups.txt
while formtx
there are 4281.I have these questions:
Thank you for your help!
Here are the errors:
The text was updated successfully, but these errors were encountered: