- 
                Notifications
    You must be signed in to change notification settings 
- Fork 79
Post Processing Merged BIOMs extended #2698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
analysis.build_files will now run an arbitrary number of post-processes defined in qiita.software_command.post_processing_cmd column. It's assumed that each post-process is a python script, with a defined Conda environment. It's also currently assumed that each post-process overwrites the merged BIOM file. A step removed from calling plugins, but getting closer.
| Codecov Report
 @@            Coverage Diff             @@
##              dev    #2698      +/-   ##
==========================================
+ Coverage   94.33%   94.34%   +<.01%     
==========================================
  Files         166      166              
  Lines       19840    19888      +48     
==========================================
+ Hits        18717    18764      +47     
- Misses       1123     1124       +1
 Continue to review full report at Codecov. 
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the rename: addtl_processing_cmd -> post_processing_cmd. Note that the change of location of the worker.py file will mean some changes in the code too.
Paths changed to reflect new location of script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, one minor comment but that might impact also the PR in qp-deblur.
        
          
                qiita_db/analysis.py
              
                Outdated
          
        
      | cmd['script_params'].items()]) | ||
|  | ||
| # append fragments file and output dir parameters | ||
| params = "%s --fp_fragments=%s --output_dir=%s" %\ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the name of the parameter from --fp_fragments= to --fp_archive so it's more generic; at the end of the day this can be fragments or anything else the plugin stores in the database/archive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem - changes already made to qiita and qp_deblur. One thing though, the command line util that strips fragments from the biom that aren't in the tree takes --fp_phylogeny and --fp_biom as options; do you think they might be too specific as well?
Originally, I thought a chain of commands would each in turn iterate over the same biom file, but since the second command will need a tree and a biom file as input, the database entry will need to be updated with the fp_s to the data, or the second command will need to assume the name and location of the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those names internally in deblur are fine but you are right about those here. Also all the comments that mention phylogeny vs. extra file for analysis ...
fragments-named variables renamed to archives, to be more general. --fp_fragments option also renamed to fp_archives, where appropriate.
        
          
                qiita_db/analysis.py
              
                Outdated
          
        
      | # suitable for other files as well. | ||
| output_dir = base_fp | ||
|  | ||
| fp_archive = '%s/archive.json' % output_dir | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for not catching this before: you should use join to create this path and also use the analysis id to be safe something like: fp_archive = join(output_dir, 'archive_%d.json' % self.id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha, platform independence path creation.
…iita into post_processing
Modified test to address name change from --fp_fragments to --fp_archive.
analysis.build_files will now run an arbitrary number of post-processes defined
in qiita.software_command.post_processing_cmd column. It's assumed that each
post-process is a python script, with a defined Conda environment. It's also
currently assumed that each post-process overwrites the merged BIOM file. A
step removed from calling plugins, but getting closer.