Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add make(dest) to TypedPipe #1217

Merged
merged 2 commits into from
Apr 13, 2015
Merged

Add make(dest) to TypedPipe #1217

merged 2 commits into from
Apr 13, 2015

Conversation

avibryant
Copy link
Contributor

This allows (very) simple make-like behavior with Execution, where a writeThrough will just turn into a read if the destination already exists.

Note that since validateTaps isn't defined for local mode, this won't work at all in that context. It may be reasonable to require that to be fixed as part of this PR.

Also, it obviously needs a couple of tests before it can be merged.

@johnynek
Copy link
Collaborator

johnynek commented Mar 4, 2015

This idea is really nice. People have reimplemented this exact function at least twice internally at Twitter.

I agree that shipping this for Local mode but it being broken is not okay. We could either always compute in Local mode (lame) or we could make Local mode properly support validateTaps everywhere.

Actually, I didn't know that validateTaps is broken in that case. That deserves its own issue. Do you have more details?

@avibryant
Copy link
Contributor Author

Yeah, in Local mode validateTaps is always a no-op, which means it will never compute in Local mode, but presumably it will then crash down the line. See:
https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/FileSource.scala#L183

@johnynek what do you think of the name, and do we need a variation that returns Execution[Unit]? (I kinda think we don't, and that people can just chain .unit if they want).

@johnynek
Copy link
Collaborator

johnynek commented Mar 4, 2015

@avibryant I think there is no need for an Execution[Unit] version (agree about .unit). I like the name make since it reminds us of the make program.

@reconditesea
Copy link
Contributor

I also like make program :) But will .readOrMake be more informative about what does it do?

@avibryant
Copy link
Contributor Author

Updated FileSource to do the right thing in validateTaps in Local mode.
Also changed make to require a FileSource because for now it's very unlikely the semantics will be right for any other Source, and I'd rather protect people from confusion.

@joshualande
Copy link
Contributor

@avibryant, thanks for working on this. This will be a great feature.

fwiw, it seems like the main use-case for this tool is ad-hoc work in the Scalding REPL(?).

The name that would make the most sense to me is .writeOrLoad (or .saveOrLoad) so that the API looks similar to the .write/.save functions.

@avibryant
Copy link
Contributor Author

@joshualande actually, I hadn't been thinking of REPL use-cases (though that makes sense now that you mention it). I was thinking more about restartable data pipelines with complex dependency graphs - what you might use luigi or oozie for currently.

@@ -195,6 +195,13 @@ abstract class FileSource extends SchemedSource with LocalSourceOverride {
"[" + this.toString + "] No good paths in: " + hdfsPaths.toString)
}
}

case Local(_) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you want Local(true) here right? strict = true?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not. The previous (perhaps questionable) logic is that strict means something is there, but maybe not all the paths. Nothing being present still won't run. Since currently Local only supports one path (even though it could support more), I suppose this is correct.

@DanielleSucher
Copy link
Collaborator

Ping! Anything else y'all want before this feels ready to merge? ^^

@johnynek
Copy link
Collaborator

Let's shipit!

johnynek added a commit that referenced this pull request Apr 13, 2015
Add make(dest) to TypedPipe
@johnynek johnynek merged commit 49834f2 into twitter:develop Apr 13, 2015
@johnynek
Copy link
Collaborator

closes #1126

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 1314460 on avibryant:avi-make into * on twitter:develop*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants