Array flatten function (v2) #154

orodeh · 2017-11-06T20:54:06Z

This is a cleanup of this pull request

Currently, there is no library function to flatten an array of array of files (Array[Array[File]]). A scatter, where each task call produces an array of files, is a natural way of ending up with such a structure. In order to flatten this array, you can write a task that takes the it as an argument, and manipulate it with python code. However, this task will also download all the files, taking significant time and disk space. To work around this, you can coerce the files into strings (their paths), and manipulate the paths.

You can see an example here. The chunk_reads_join task flattens the fastq_chunks file array, which is coerced into an Array[Array[String]].

In order to avoid this circuitous implementation, this pull requests suggest a standard library function instead.

geoffjentry · 2017-11-07T17:26:47Z

I mentioned it in the previous PR but for posterity over here I like this proposal. @orodeh - I'd encourage to try to see if anyone else wants to chime in and either when that dies down or if it never starts we can call for vote

geoffjentry · 2017-11-13T17:21:53Z

@orodeh This has gone several days w/o commentary so I'm going to move it to voting now.

cjllanwarne · 2017-11-13T17:22:14Z

SPEC.md

+Array[Array[File]] af2D = [["/tmp/X.txt"], ["/tmp/Y.txt", "/tmp/Z.txt"], []]
+Array[File] af = flatten(af2D)   # ["/tmp/X.txt", "/tmp/Y.txt", "/tmp/Z.txt"]
+
+Array[Array[Pair[Float,String]]] aap2D = [[(0.1, "mouse")], [(3, "cat"), (15, "dog")]]


Maybe note that this is useful because Map[X, Y] can be coerced to Array[Pair[X, Y]]

cjllanwarne · 2017-11-13T17:24:17Z

SPEC.md

+
+Given an array of arrays, the `flatten` function concatenates all the
+member arrays in the order to appearance to give the result. It does not
+deduplicate the elements. For example:


Comment that arrays nested more deeply than 2 must be flattened twice (or more) to get down to an unnested Array[X]

geoffjentry · 2017-11-13T17:30:08Z

@orodeh Via a quirk of timing @cjllanwarne managed to submit comments right as I opened for voting. I think if you want to/can address those prior to other people submitting votes it'll be fine to do so. Otherwise we'll need to track down voters to make sure they still vote the same way with whatever changes are in effect.

cjllanwarne · 2017-11-13T17:30:10Z

I was just looking at this, apparently in parallel! Maybe a "last comments, please" warning 24h before voting starts is a good idea for future PRs?

Regardless, I vote 👍 pending the small clarifications I mentioned in the comments.

orodeh · 2017-11-13T17:44:54Z

@cjllanwarne I added your comments into the text, to clarify the semantics and usage of the flatten function. Thanks!

patmagee · 2017-11-13T17:50:32Z

👍

antonkulaga · 2017-11-13T19:04:05Z

👍

aprabhak2 · 2017-11-14T00:18:53Z

+1

chapmanb · 2017-11-14T14:25:57Z

+1 -- this is really nice functionality. In my CWL workflows I often find myself needing this when moving back and forth between parallized batches (like tumor/normal calling) and single sample processing.

geoffjentry · 2017-11-25T20:47:29Z

By a vote of 6-0 this PR passes. It will remain open until an implementation is realized. I know there's a Cromwell PR which will presumably be merged in soon, @orodeh let me know if dxWDL (or an impl I'm not aware of) already supports this.

orodeh · 2018-01-30T19:25:02Z

This is already in Cromwell. It has been for a while now (wdl/src/main/scala/wdl/expression/WdlStandardLibraryFunctions.scala).

geoffjentry · 2018-01-30T22:19:03Z

@orodeh This was merged nearly 2 months ago :)

orodeh · 2018-01-30T22:34:41Z

@geoffjentry That was what I thought, but it still carries the label "Waiting for Implementation".

orodeh added 2 commits November 6, 2017 12:46

WIP

5296136

WIP

af966cb

orodeh mentioned this pull request Nov 6, 2017

Array flatten function #153

Closed

geoffjentry added the in review label Nov 8, 2017

geoffjentry added Voting Active and removed in review labels Nov 13, 2017

cjllanwarne reviewed Nov 13, 2017

View reviewed changes

Addressing review comments

9a136d8

geoffjentry added Waiting for implementation and removed Voting Active labels Nov 29, 2017

geoffjentry merged commit 68561d0 into openwdl:master Dec 5, 2017

geoffjentry removed the Waiting for implementation label Jan 30, 2018

orodeh deleted the orodeh_flatten_v3 branch February 28, 2018 21:13

cjllanwarne mentioned this pull request Mar 13, 2018

Don't forget flatten #202

Merged

cjllanwarne mentioned this pull request May 11, 2018

Redo the flatten PR into draft-2/SPEC.MD #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array flatten function (v2) #154

Array flatten function (v2) #154

orodeh commented Nov 6, 2017

geoffjentry commented Nov 7, 2017

geoffjentry commented Nov 13, 2017

cjllanwarne Nov 13, 2017

cjllanwarne Nov 13, 2017

geoffjentry commented Nov 13, 2017

cjllanwarne commented Nov 13, 2017

orodeh commented Nov 13, 2017

patmagee commented Nov 13, 2017

antonkulaga commented Nov 13, 2017 •

edited

Loading

aprabhak2 commented Nov 14, 2017

chapmanb commented Nov 14, 2017

geoffjentry commented Nov 25, 2017

orodeh commented Jan 30, 2018

geoffjentry commented Jan 30, 2018

orodeh commented Jan 30, 2018

Array flatten function (v2) #154

Array flatten function (v2) #154

Conversation

orodeh commented Nov 6, 2017

geoffjentry commented Nov 7, 2017

geoffjentry commented Nov 13, 2017

cjllanwarne Nov 13, 2017

Choose a reason for hiding this comment

cjllanwarne Nov 13, 2017

Choose a reason for hiding this comment

geoffjentry commented Nov 13, 2017

cjllanwarne commented Nov 13, 2017

orodeh commented Nov 13, 2017

patmagee commented Nov 13, 2017

antonkulaga commented Nov 13, 2017 • edited Loading

aprabhak2 commented Nov 14, 2017

chapmanb commented Nov 14, 2017

geoffjentry commented Nov 25, 2017

orodeh commented Jan 30, 2018

geoffjentry commented Jan 30, 2018

orodeh commented Jan 30, 2018

antonkulaga commented Nov 13, 2017 •

edited

Loading