-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDL Draft 2 workflow source + imports => namespace cache #3971
Conversation
411c5e8
to
ef118ff
Compare
private val namespaces: Cache[WorkflowSource, ErrorOr[WdlNamespaceWithWorkflow]] = CacheBuilder.newBuilder() | ||
.concurrencyLevel(2) | ||
.expireAfterWrite(20, TimeUnit.MINUTES) | ||
.maximumSize(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
knobortunities
|
||
private val namespaces: Cache[WorkflowSource, ErrorOr[WdlNamespaceWithWorkflow]] = CacheBuilder.newBuilder() | ||
.concurrencyLevel(2) | ||
.expireAfterWrite(20, TimeUnit.MINUTES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToL: Not that it would make much of a difference but I think this could even be expireAfterAccess
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh I like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how the develop
imports got out of sync with 34_hotfix
but this PR needs minor patches.
import wom.values.WomValue | ||
import languages.wdl.draft2.WdlDraft2LanguageFactory._ | ||
import wom.transforms.WomExecutableMaker.ops._ | ||
import wom.values.{WomValue, _} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import, and others imports this file, have been Intellij-ed and should be fixed.
case _: WorkflowSourceFilesWithoutImports => | ||
WdlNamespaceWithWorkflow.load(workflowSource, importResolvers map resolverConverter).toErrorOr | ||
def workflowHashKey: String = { | ||
workflowSource.md5Sum + (source.importsZipFileOption map { bytes => new String(MessageDigest.getInstance("MD5").digest(bytes)) }).getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so but to double check - are we happy to assert that MD5suming the zip file is less CPU intensive than just parsing the WDL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a pretty safe assumption that digesting the array of bytes is cheaper than stringifying it and parsing. 🙂
} | ||
|
||
lazy val wdlNamespaceValidation: ErrorOr[WdlNamespaceWithWorkflow] = namespaces.get(workflowHashKey, new Callable[ErrorOr[WdlNamespaceWithWorkflow]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RHS of this is get
is lazy, right...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could make that explicit by separating it as
def lazyWorkflowMaker: () => Callable[ErrorOr[WdlNamespaceWithWorkflow]] = ...
lazy val wdlNamespaceValidation: ErrorOr[WdlNamespaceWithWorkflow] =
namespaces.get(workflowHashKey, lazyWorkflowMaker)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is lazy.
ef118ff
to
e1f20c0
Compare
Since this is the
|
@cjllanwarne You're already on thin ice w/ test recommendations |
@geoffjentry alright, alright, no need to get testy... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pfffft
- it'd be nice to also have one for hashing so you don't break that either
case _: WorkflowSourceFilesWithoutImports => | ||
WdlNamespaceWithWorkflow.load(workflowSource, importResolvers map resolverConverter).toErrorOr | ||
def workflowHashKey: String = { | ||
workflowSource.md5Sum + (source.importsZipFileOption map { bytes => new String(MessageDigest.getInstance("MD5").digest(bytes)) }).getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a pretty safe assumption that digesting the array of bytes is cheaper than stringifying it and parsing. 🙂
} | ||
|
||
lazy val wdlNamespaceValidation: ErrorOr[WdlNamespaceWithWorkflow] = namespaces.get(workflowHashKey, new Callable[ErrorOr[WdlNamespaceWithWorkflow]] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is lazy.
8c7d257
to
2236a4a
Compare
@cjllanwarne @kshakir ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I do think we want a more scary CHANGELOG entry for the hotfixees who may not realize they have to opt-in too
# # WDL Draft 2 namespace caching is off by default, this value must be set to true to enable it. | ||
# enabled: false | ||
# # Guava cache concurrency | ||
# concurrency: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does concurrency mean in cache terms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's literally "Guava cache concurrency" in that this parameter is passed to the cache builder's .concurrency()
method.
cromwell.examples.conf
Outdated
# concurrency: 2 | ||
# # How long entries in the cache should live from the time of their last access. | ||
# ttl: 20 minutes | ||
# # Maximum size of the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in terms of WdlNamespace objects stored)
CHANGELOG.md
Outdated
@@ -7,14 +7,22 @@ | |||
Cromwell now allows for a user to submit the URL pointing to workflow file to run a workflow using `workflowUrl` parameter. Currently, this is only supported in `Server` mode. | |||
More details on how to use it can be found [here](http://cromwell.readthedocs.io/en/develop/api/RESTAPI/). | |||
|
|||
### Languages | |||
|
|||
- Added an opt-in namespace cache for the WDL Draft 2 language factory. Please see the Cromwell example configuration for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe warn that if people are upgrading from hotfix, their currently enabled cache is now opt-in rather than on-by-default?
2236a4a
to
b6a6e87
Compare
Particularly useful for not re-parsing workflows for every set of inputs in a batch submission.