-
-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent location and name for renamed output of type File when also included in output of type Directory for CLT #1628
Comments
Hello @jotasi and thanks for your issue
Can you give us a bit more background and context on what you are trying to achieve and why you expect/want this other behavior? About renaming: we could add a flag to normalize the |
Hi @mr-c and thanks for the quick reply. My use case would be that I have a tool that should become part of a pipeline. This tool generates output in a dedicated directory. For the further pipeline, I only need some of the files that are generated in the directory. These I would thus track as outputs of type As I use the output(s) of type The renaming, I would like to do as the step I'm actually performing is run on multiple samples and I want to rename the file to reflect the sample the output originated from for easier attribution. Setting an additional flag would be fine. In my particular case, the files are reasonably small so I don't really mind making copies, but soft or hard links would be fine as well. |
👍
For development and debugging purposes, I recommend using |
That sounds quite helpful for local development and debugging. Thanks. I would probably still like to store the directory as an output though as I wan to run the same workflow also with different runners e.g. online on the SBG platform and would want to store the directory for these executions for future reference as well (as I might want to look at the additional outputs produced by the tool in future). |
Thanks for the context. I agree that this would be nice to have. The fast fix would be to write a script (in the language of your choice) that takes the output JSON from the
FYI: my memory is that SBG and other cloud based systems store the entire output directory (for at least a while) https://docs.sevenbridges.com/docs/about-memoization#intermediate-files says they default to 24 hours, with a max of 5 days. So I guess you might want to keep them longer. For Arvados, this is configured by the |
Expected Behavior
When an output of type
File
is defined in aCommandLineTool
, it should be located directly in the specified--outdir
even if it is also part of another output of typeDirectory
(in that case, two copies should exist). I understand that this might not be desired to avoid having two copies of the file. However, even when you want to avoid the duplication, when renaming the output of typeFile
by changing itsbasename
in an expression, not only thebasename
but also thelocation
andpath
should reflect the newbasename
as otherwise the output object provided for the file is inconsistent.Actual Behavior
When generating a file within a directory and then tracking both the directory and the file as two separate outputs of one
CommandLineTool
(one of typeDirectory
and one of typeFile
), the output of typeFile
no longer is stored directly within the output directory but only within the directory structure provided by the output of typeDirectory
(i.e. only one copy of the file is tracked as output within the directory). This is independent of whether the file is renamed or not. Furthermore, renaming the output of typeFile
by changing thebasename
in anoutputEval
expression causes an inconsistent output type with thebasename
being changed but thelocation
andpath
(as well as the actual physical location of the file) still corresponding to the old filename.Workflow Code
with input
provides the following output:
When removing the output of type
Directory
, the file is directly in the output dir and renamed as expected (same input):produces:
Full Traceback
N/A
Your Environment
The text was updated successfully, but these errors were encountered: