Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-1045] Arbitrary WGS Inputs #218

Merged
merged 14 commits into from
Nov 12, 2020
Merged

Conversation

ehigham
Copy link
Member

@ehigham ehigham commented Nov 9, 2020

In this change:

WGS

  • specify arbitrary inputs
  • update skip-workflow? to handle bams as well!

Database changes:

  • CromwellWorkflow is unreleased so we can just shove the options in (don't need a change log)
  • copyfile can just use the CromwellWorkflow table

API changes

  • combine workflowOptions and common_inputs into a "common" map :common { :inputs {} :options {}}, where options are workflowOptions renamed.
  • implement (extending Jack's work) for xx, wgs and copyfile

Serialisation change

  • we can't store and just load workflow options owing to compatibility concerns (previous workflows didn't store these). I've opted not to store the default environment workflow-options and simply merge these when we talk to cromwell. I'm happy to move this to the loader if this is a point of contention. Thoughts/ideas/opinions welcome.

Puzzles:

  • unsure what to do with artificial inputs like reference_fasta_prefix and how that should tie in with being able to specify any input.

Not included

  • doc changes will not be included in this PR. They'll be updated in a subsequent change

@jack-r-warren
Copy link
Contributor

Maybe when we do that followup documentation PR we should write down what the default options are for the different modules since this API no longer surfaces that information. Could also address the loading-old-workloads thing by merging the defaults in twice--once when we write to the DB and once when we submit to Cromwell. Would generally be a no-op the second time but it would provide the backwards compatibility from merging at submit-time.

@ehigham
Copy link
Member Author

ehigham commented Nov 9, 2020

Maybe when we do that followup documentation PR we should write down what the default options are for the different modules since this API no longer surfaces that information. Could also address the loading-old-workloads thing by merging the defaults in twice--once when we write to the DB and once when we submit to Cromwell. Would generally be a no-op the second time but it would provide the backwards compatibility from merging at submit-time.

Do we want all these options to be made public? This includes service account stuff that's sort of a hack/implementation detail?
I can make all options visible, if you prefer! I'll move the behaviour into the serialisation impl if you prefer - not a big issue for me.

Comment on lines +37 to +49
(defn create-copyfile-workload!
"Use transaction TX to add the workload described by REQUEST."
[tx {:keys [items common] :as request}]
(letfn [(merge-to-json [shared specific]
(json/write-str (util/deep-merge shared specific)))
(serialize [workflow id]
(-> workflow
(assoc :id id)
(update :inputs #(merge-to-json (:inputs common) %))
(update :options #(merge-to-json (:options common) %))))]
(let [[id table] (batch/add-workload-table! tx workflow-wdl request)]
(jdbc/insert-multi! tx table (map serialize items (range)))
(workloads/load-workload-for-id tx id))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create-workload! implementations for copyfile, wgs and xx are almost identical. In a subsequent change, we could explore reducing this technical debt. This means (in theory) we need to test only once for the "batch" workflows (or instantiate a test suite).

Comment on lines 51 to 66
(defn start-copyfile-workload!
"Use transaction TX to start _WORKLOAD."
[tx {:keys [cromwell items uuid] :as workload}]
(let [env (first (all/cromwell-environments cromwell))]
(letfn [(submit! [{:keys [id inputs workflow_options]}]
[id (submit-workflow env inputs workflow_options {:workload uuid}) "Submitted"])
(update! [tx [id uuid status]]
[tx {:keys [items uuid] :as workload}]
(let [env (get-cromwell-environment workload)
default-options (util/make-options env)]
(letfn [(submit! [{:keys [id inputs options]}]
[id (submit-workflow env inputs
(util/deep-merge default-options options)
{:workload uuid})])
(update! [tx [id uuid]]
(jdbc/update! tx items
{:updated (OffsetDateTime/now) :uuid uuid :status status}
{:updated (OffsetDateTime/now) :uuid uuid :status "Submitted"}
["id = ?" id]))]
(run! (comp (partial update! tx) submit!) (:workflows workload))
(jdbc/update! tx :workload
{:started (OffsetDateTime/now)} ["uuid = ?" uuid]))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementations for start-workload! still differ somewhat. This is due to the xx module using (a faked) batch submit to cromwell. In a subsequent change, we could explore moving the other modules onto that endpoint and actually implement it. Given that, maybe we can extract common functionality for these modules (c.f. create-workload!)

@jack-r-warren
Copy link
Contributor

Do we want all these options to be made public? This includes service account stuff that's sort of a hack/implementation detail?
I can make all options visible, if you prefer! I'll move the behaviour into the serialisation impl if you prefer - not a big issue for me.

I think it would be helpful for ops to know in order for them to override stuff. Honestly I think if we are doing hacky stuff I think that's all the more reason to surface what on earth we are doing. And again we don't gotta actually return it from the endpoint I guess -- if there's stuff we really don't want to show them then maybe just documenting what we do want to show them is fine. But since they'd be able to override our hacky stuff I don't see much of a harm.
A bit of my reasoning here is that Cromwell doesn't document any way to get the existing workflow options -- you can get them if you pass /metadata through jq '.submittedFiles.options | fromjson', but that submittedFiles field isn't documented and that makes me worry that we don't have guarantees around it.

Comment on lines 129 to 135
(defn saved-before?
"Test if the `_workload` was saved before the `reference` version string."
[reference {:keys [version] :as _workload}]
(letfn [(decode [v] (map util/parse-int (str/split v #"\.")))
(lt? [[x & xs] [y & ys]]
(or (< x y) (and (== x y) (every? some? [xs ys]) (lt? xs ys))))]
(lt? (decode version) (decode reference))))
Copy link
Member Author

@ehigham ehigham Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation is over-complicated but I'm not sure how to simplify. Suggestions welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try.

@ehigham
Copy link
Member Author

ehigham commented Nov 10, 2020

Do we want all these options to be made public? This includes service account stuff that's sort of a hack/implementation detail?
I can make all options visible, if you prefer! I'll move the behaviour into the serialisation impl if you prefer - not a big issue for me.

I think it would be helpful for ops to know in order for them to override stuff. Honestly I think if we are doing hacky stuff I think that's all the more reason to surface what on earth we are doing. And again we don't gotta actually return it from the endpoint I guess -- if there's stuff we really don't want to show them then maybe just documenting what we do want to show them is fine. But since they'd be able to override our hacky stuff I don't see much of a harm.
A bit of my reasoning here is that Cromwell doesn't document any way to get the existing workflow options -- you can get them if you pass /metadata through jq '.submittedFiles.options | fromjson', but that submittedFiles field isn't documented and that makes me worry that we don't have guarantees around it.

As an aside - if they don't document if, I don't think we should. (i.e. we document that you can pass workflow options, just not what they are.)

(update m :workflow_options #(when % (util/parse-json %))))]
(let [keep [:id :finished :status :updated :uuid :options]]
(assoc (select-keys m keep) :inputs (apply dissoc m keep))))
(load-options [m] (update m :options (fnil util/parse-json "null")))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should build that into parse-json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be uses where passing nil into this is an error - building the fnil'ness into parse-json would make that harder to detect.

api/src/wfl/module/batch.clj Outdated Show resolved Hide resolved
api/src/wfl/module/wgs.clj Show resolved Hide resolved
api/test/wfl/integration/modules/wgs_test.clj Show resolved Hide resolved
api/test/wfl/tools/workloads.clj Show resolved Hide resolved
api/test/wfl/unit/modules/xx_test.clj Show resolved Hide resolved
database/changesets/20201029_CromwellWorkflow.xml Outdated Show resolved Hide resolved
Comment on lines 129 to 135
(defn saved-before?
"Test if the `_workload` was saved before the `reference` version string."
[reference {:keys [version] :as _workload}]
(letfn [(decode [v] (map util/parse-int (str/split v #"\.")))
(lt? [[x & xs] [y & ys]]
(or (< x y) (and (== x y) (every? some? [xs ys]) (lt? xs ys))))]
(lt? (decode version) (decode reference))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try.

@ehigham ehigham merged commit ff13c3a into master Nov 12, 2020
@ehigham ehigham deleted the ehigham/GH-1045-wgs-inner-wdl-inputs branch November 18, 2020 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants