-
Notifications
You must be signed in to change notification settings - Fork 9
Skip baking task when parbake task didn't find any required recipes #1704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Skip baking task when parbake task didn't find any required recipes #1704
Conversation
…any recipes to configure.
|
@jfrost-mo - I've not asked for a review yet because I haven't tested it properly, but can you see any problems with this? |
|
Seems like a sensible approach, as compared to just making the bake task succeed if it has no recipes, this will avoid having to request a large job only to not use it. |
Co-authored-by: James Frost <james.frost@metoffice.gov.uk>
jfrost-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions, but otherwise looks good. After some testing I'd be happy for this to be merged.
|
I'll give this a test and merge it if it looks good, ahead of our next release. |
|
I'm a bit tempted to go for an alternative implementation where we unconditionally run the baking tasks. It would be less efficient, but it should be simpler in the workflow. It depends on the cost of getting a job that doesn't do anything through the queue. |
|
For long trials I can see the null jobs actually taking a fair amount of queuing, as you will have one per cycle. |
|
I can get it working with suicide triggers, but I'd like the avoid them if possible, as they are more of a cylc 7 thing. |
|
The issue is that |
This is mainly to ensure we have validated our environment before using it.
|
I've done some testing around this, and the current behaviour (since we introduced parbaking) is actually that the bake/bake_aggregation tasks will succeed even if they have no recipes. Instead, the parbaking task has its own check that at least one recipe is enabled, even if it is not the aggregation type being produced. So the use case if disabling all non-aggregation tasks is already supported, though it does run some baking jobs that immediately finish. How big a problem that is is what is in question. I'm tempted to close this pull request on the bases that I'm not sure the added complexity to the workflow is worth the savings. This is based on my testing today, where the slurm job for the baking only queued for about 10 seconds, and finished within 2 seconds when there were no recipes. However, I haven't been using much compute over the past few days, so my fair-share is in good standing. |
|
So, from https://cylc.github.io/cylc-doc/stable/html/user-guide/writing-workflows/scheduling.html#dependencies-with-multiple-optional-outputs |
|
With regards to whether this is worth it though, the problem is with a long trial (say Thinking about it again, I'm not convinced this is the correct solution. You don't really want the workflow to run at all if it's not going to do anything (or you just want it to run the fetch and aggregation). Some version #1720 that checks the recipes before the suite is even installed is probably a better solution. |
|
It probably is worth it then, but yes, there may be better ways. Given we do want the data to be fetched, the remaining tasks running per-cycle would be:
So the tasks we want to eliminate are the parbaking and especially the baking. In principle there is no reason the parbaking has to happen in the cycle it is for, as it has no dependency on other tasks (other than the initial environment setup). So we could move to parbaking for all cycles in the first cycle, which would save us a slurm job per cycle. The same thing could be said about the baking task TBH. Once we have parbaked the recipes, the only cycle specific thing it does is to look in the appropriately dated folder for the recipes to bake. And that is only so we don't end up baking the same recipe multiple times. We could ditch cycling entirely, and just chunk the work up for parallelism regardless of when it was for. That might make it harder to support running in read-time, though I'm not convinced there is much of a driving motivation to do that with CSET. |

Use custom cylc triggers to skip 'baking' when 'parbake' didn't find any recipes to configure.
Contribution checklist
Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.