Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDL Draft 2 workflow source + imports => namespace cache #3971

Merged
merged 1 commit into from
Aug 8, 2018

Conversation

mcovarr
Copy link
Contributor

@mcovarr mcovarr commented Aug 4, 2018

Particularly useful for not re-parsing workflows for every set of inputs in a batch submission.

private val namespaces: Cache[WorkflowSource, ErrorOr[WdlNamespaceWithWorkflow]] = CacheBuilder.newBuilder()
.concurrencyLevel(2)
.expireAfterWrite(20, TimeUnit.MINUTES)
.maximumSize(1000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knobortunities


private val namespaces: Cache[WorkflowSource, ErrorOr[WdlNamespaceWithWorkflow]] = CacheBuilder.newBuilder()
.concurrencyLevel(2)
.expireAfterWrite(20, TimeUnit.MINUTES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToL: Not that it would make much of a difference but I think this could even be expireAfterAccess ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh I like that

Copy link
Contributor

@kshakir kshakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how the develop imports got out of sync with 34_hotfix but this PR needs minor patches.

import wom.values.WomValue
import languages.wdl.draft2.WdlDraft2LanguageFactory._
import wom.transforms.WomExecutableMaker.ops._
import wom.values.{WomValue, _}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import, and others imports this file, have been Intellij-ed and should be fixed.

case _: WorkflowSourceFilesWithoutImports =>
WdlNamespaceWithWorkflow.load(workflowSource, importResolvers map resolverConverter).toErrorOr
def workflowHashKey: String = {
workflowSource.md5Sum + (source.importsZipFileOption map { bytes => new String(MessageDigest.getInstance("MD5").digest(bytes)) }).getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so but to double check - are we happy to assert that MD5suming the zip file is less CPU intensive than just parsing the WDL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a pretty safe assumption that digesting the array of bytes is cheaper than stringifying it and parsing. 🙂

}

lazy val wdlNamespaceValidation: ErrorOr[WdlNamespaceWithWorkflow] = namespaces.get(workflowHashKey, new Callable[ErrorOr[WdlNamespaceWithWorkflow]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RHS of this is get is lazy, right...?

Copy link
Contributor

@cjllanwarne cjllanwarne Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could make that explicit by separating it as

def lazyWorkflowMaker: () => Callable[ErrorOr[WdlNamespaceWithWorkflow]] = ...
lazy val wdlNamespaceValidation: ErrorOr[WdlNamespaceWithWorkflow] = 
    namespaces.get(workflowHashKey, lazyWorkflowMaker)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is lazy.

@cjllanwarne
Copy link
Contributor

Since this is the develop version: If I told you that one day I intend to shuffle this logic into the LanguageFactory trait and make it apply equally to WDL 1.0, 1.1, CWL, etc...

  1. Would you trust me to do the right thing; or...
  2. Would you want to add a regression test to make sure I don't ruin this great work?

@geoffjentry
Copy link
Contributor

@cjllanwarne You're already on thin ice w/ test recommendations

@cjllanwarne
Copy link
Contributor

@geoffjentry alright, alright, no need to get testy...

Copy link
Contributor Author

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pfffft
  2. it'd be nice to also have one for hashing so you don't break that either

case _: WorkflowSourceFilesWithoutImports =>
WdlNamespaceWithWorkflow.load(workflowSource, importResolvers map resolverConverter).toErrorOr
def workflowHashKey: String = {
workflowSource.md5Sum + (source.importsZipFileOption map { bytes => new String(MessageDigest.getInstance("MD5").digest(bytes)) }).getOrElse("")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a pretty safe assumption that digesting the array of bytes is cheaper than stringifying it and parsing. 🙂

}

lazy val wdlNamespaceValidation: ErrorOr[WdlNamespaceWithWorkflow] = namespaces.get(workflowHashKey, new Callable[ErrorOr[WdlNamespaceWithWorkflow]] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is lazy.

@mcovarr mcovarr force-pushed the mlc_memory_spike branch 6 times, most recently from 8c7d257 to 2236a4a Compare August 7, 2018 21:24
@mcovarr
Copy link
Contributor Author

mcovarr commented Aug 8, 2018

@cjllanwarne @kshakir ready for re-review.

Copy link
Contributor

@kshakir kshakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

Copy link
Contributor

@cjllanwarne cjllanwarne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I do think we want a more scary CHANGELOG entry for the hotfixees who may not realize they have to opt-in too

# # WDL Draft 2 namespace caching is off by default, this value must be set to true to enable it.
# enabled: false
# # Guava cache concurrency
# concurrency: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does concurrency mean in cache terms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's literally "Guava cache concurrency" in that this parameter is passed to the cache builder's .concurrency() method.

# concurrency: 2
# # How long entries in the cache should live from the time of their last access.
# ttl: 20 minutes
# # Maximum size of the cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in terms of WdlNamespace objects stored)

CHANGELOG.md Outdated
@@ -7,14 +7,22 @@
Cromwell now allows for a user to submit the URL pointing to workflow file to run a workflow using `workflowUrl` parameter. Currently, this is only supported in `Server` mode.
More details on how to use it can be found [here](http://cromwell.readthedocs.io/en/develop/api/RESTAPI/).

### Languages

- Added an opt-in namespace cache for the WDL Draft 2 language factory. Please see the Cromwell example configuration for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe warn that if people are upgrading from hotfix, their currently enabled cache is now opt-in rather than on-by-default?

@mcovarr mcovarr merged commit 9b86643 into develop Aug 8, 2018
@mcovarr mcovarr deleted the mlc_memory_spike branch August 8, 2018 20:46
mcovarr added a commit that referenced this pull request Aug 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants