-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Caching for FlowDefExecution #1581
Conversation
Looks like this isn't a trampolining issue but a caching issue, in that we could have multiple runs of a re-used FlowDefExecution? |
Right, bad title. We cache to everywhere else. Not sure why this one should not be cached. |
The basic idea is that hadoop jobs are so expensive, we should definitely cache them. |
protected def runStats(conf: Config, mode: Mode, cache: EvalCache)(implicit cec: ConcurrentExecutionContext) = | ||
Trampoline( | ||
protected def runStats(conf: Config, mode: Mode, cache: EvalCache)(implicit cec: ConcurrentExecutionContext) = { | ||
lazy val future = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go outside the method? Or does this live across method calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends on conf
and mode
so I don't think it can go outside the method. Being lazy just means we don't actually fire this future unless the cache does not already have a run of this Execution with the same Config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, right. I missed the getOrElse part below.
👍 So without this fix, it looks like we can have undesirable results if the Execution is not idempotent? |
@rubanm exactly. There is also the chance that a job could get run twice (and if our experience in the past is any indication, that happens a lot with Execution without the cache). Note, the cache was added only after bug reports that using Execution resulted in many duplicated jobs. The main case for duplication is the use or |
👍 |
When @jnievelt added Trampolines to Execution, this one was left off.