-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array flatten function (v2) #154
Conversation
I mentioned it in the previous PR but for posterity over here I like this proposal. @orodeh - I'd encourage to try to see if anyone else wants to chime in and either when that dies down or if it never starts we can call for vote |
@orodeh This has gone several days w/o commentary so I'm going to move it to voting now. |
Array[Array[File]] af2D = [["/tmp/X.txt"], ["/tmp/Y.txt", "/tmp/Z.txt"], []] | ||
Array[File] af = flatten(af2D) # ["/tmp/X.txt", "/tmp/Y.txt", "/tmp/Z.txt"] | ||
|
||
Array[Array[Pair[Float,String]]] aap2D = [[(0.1, "mouse")], [(3, "cat"), (15, "dog")]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that this is useful because Map[X, Y]
can be coerced to Array[Pair[X, Y]]
SPEC.md
Outdated
|
||
Given an array of arrays, the `flatten` function concatenates all the | ||
member arrays in the order to appearance to give the result. It does not | ||
deduplicate the elements. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that arrays nested more deeply than 2 must be flattened twice (or more) to get down to an unnested Array[X]
@orodeh Via a quirk of timing @cjllanwarne managed to submit comments right as I opened for voting. I think if you want to/can address those prior to other people submitting votes it'll be fine to do so. Otherwise we'll need to track down voters to make sure they still vote the same way with whatever changes are in effect. |
I was just looking at this, apparently in parallel! Maybe a "last comments, please" warning 24h before voting starts is a good idea for future PRs? Regardless, I vote 👍 pending the small clarifications I mentioned in the comments. |
@cjllanwarne I added your comments into the text, to clarify the semantics and usage of the |
👍 |
2 similar comments
👍 |
+1 |
+1 -- this is really nice functionality. In my CWL workflows I often find myself needing this when moving back and forth between parallized batches (like tumor/normal calling) and single sample processing. |
By a vote of 6-0 this PR passes. It will remain open until an implementation is realized. I know there's a Cromwell PR which will presumably be merged in soon, @orodeh let me know if dxWDL (or an impl I'm not aware of) already supports this. |
This is already in Cromwell. It has been for a while now (wdl/src/main/scala/wdl/expression/WdlStandardLibraryFunctions.scala). |
@orodeh This was merged nearly 2 months ago :) |
@geoffjentry That was what I thought, but it still carries the label "Waiting for Implementation". |
This is a cleanup of this pull request
Currently, there is no library function to flatten an array of array of files (Array[Array[File]]). A scatter, where each task call produces an array of files, is a natural way of ending up with such a structure. In order to flatten this array, you can write a task that takes the it as an argument, and manipulate it with python code. However, this task will also download all the files, taking significant time and disk space. To work around this, you can coerce the files into strings (their paths), and manipulate the paths.
You can see an example here. The chunk_reads_join task flattens the fastq_chunks file array, which is coerced into an Array[Array[String]].
In order to avoid this circuitous implementation, this pull requests suggest a standard library function instead.