Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array flatten function (v2) #154

Merged
merged 3 commits into from
Dec 5, 2017
Merged

Array flatten function (v2) #154

merged 3 commits into from
Dec 5, 2017

Conversation

orodeh
Copy link
Contributor

@orodeh orodeh commented Nov 6, 2017

This is a cleanup of this pull request

Currently, there is no library function to flatten an array of array of files (Array[Array[File]]). A scatter, where each task call produces an array of files, is a natural way of ending up with such a structure. In order to flatten this array, you can write a task that takes the it as an argument, and manipulate it with python code. However, this task will also download all the files, taking significant time and disk space. To work around this, you can coerce the files into strings (their paths), and manipulate the paths.

You can see an example here. The chunk_reads_join task flattens the fastq_chunks file array, which is coerced into an Array[Array[String]].

In order to avoid this circuitous implementation, this pull requests suggest a standard library function instead.

@orodeh orodeh mentioned this pull request Nov 6, 2017
@geoffjentry
Copy link
Member

I mentioned it in the previous PR but for posterity over here I like this proposal. @orodeh - I'd encourage to try to see if anyone else wants to chime in and either when that dies down or if it never starts we can call for vote

@geoffjentry
Copy link
Member

@orodeh This has gone several days w/o commentary so I'm going to move it to voting now.

Array[Array[File]] af2D = [["/tmp/X.txt"], ["/tmp/Y.txt", "/tmp/Z.txt"], []]
Array[File] af = flatten(af2D) # ["/tmp/X.txt", "/tmp/Y.txt", "/tmp/Z.txt"]

Array[Array[Pair[Float,String]]] aap2D = [[(0.1, "mouse")], [(3, "cat"), (15, "dog")]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe note that this is useful because Map[X, Y] can be coerced to Array[Pair[X, Y]]

SPEC.md Outdated

Given an array of arrays, the `flatten` function concatenates all the
member arrays in the order to appearance to give the result. It does not
deduplicate the elements. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that arrays nested more deeply than 2 must be flattened twice (or more) to get down to an unnested Array[X]

@geoffjentry
Copy link
Member

@orodeh Via a quirk of timing @cjllanwarne managed to submit comments right as I opened for voting. I think if you want to/can address those prior to other people submitting votes it'll be fine to do so. Otherwise we'll need to track down voters to make sure they still vote the same way with whatever changes are in effect.

@cjllanwarne
Copy link
Contributor

I was just looking at this, apparently in parallel! Maybe a "last comments, please" warning 24h before voting starts is a good idea for future PRs?

Regardless, I vote 👍 pending the small clarifications I mentioned in the comments.

@orodeh
Copy link
Contributor Author

orodeh commented Nov 13, 2017

@cjllanwarne I added your comments into the text, to clarify the semantics and usage of the flatten function. Thanks!

@patmagee
Copy link
Member

👍

2 similar comments
@antonkulaga
Copy link

antonkulaga commented Nov 13, 2017

👍

@aprabhak2
Copy link

+1

@chapmanb
Copy link

+1 -- this is really nice functionality. In my CWL workflows I often find myself needing this when moving back and forth between parallized batches (like tumor/normal calling) and single sample processing.

@geoffjentry
Copy link
Member

By a vote of 6-0 this PR passes. It will remain open until an implementation is realized. I know there's a Cromwell PR which will presumably be merged in soon, @orodeh let me know if dxWDL (or an impl I'm not aware of) already supports this.

@orodeh
Copy link
Contributor Author

orodeh commented Jan 30, 2018

This is already in Cromwell. It has been for a while now (wdl/src/main/scala/wdl/expression/WdlStandardLibraryFunctions.scala).

@geoffjentry
Copy link
Member

@orodeh This was merged nearly 2 months ago :)

@orodeh
Copy link
Contributor Author

orodeh commented Jan 30, 2018

@geoffjentry That was what I thought, but it still carries the label "Waiting for Implementation".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants