-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud runner terminate improvements #653
Merged
Merged
Changes from 7 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
22c1480
wip
DavidGOrtega 2fd2770
Merge branch 'master' of https://github.com/iterative/cml into runner…
DavidGOrtega cddd89a
waitCompletedPipelineJobs
DavidGOrtega 1049c50
on exit proc
DavidGOrtega 76f8e04
spotnotifier
DavidGOrtega 716240b
remove unused
DavidGOrtega 39e82dd
spotnotifier
DavidGOrtega fe8831a
notRetry SpotNotifier
DavidGOrtega d8561ba
remove status runner
DavidGOrtega 082fcba
pipelineJobs interface
DavidGOrtega 4071235
test sha issue
DavidGOrtega 8441b5f
40000
DavidGOrtega File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we separate this functionality and move it to the AMI and fallback on the terraform-provider-iterative repository? While this is a must have, keeping it as part of the CML code would introduce another unrelated responsibility to this codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just
curl
the169.254.169.254
endpoint in a loop and runsystemctl stop cml
once we get a termination notice, all from the instance script.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets open another ticket and use for now the already done work that also works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we might just want this behaviour with the runner, so I guess it should go in the TPI runner resource at most
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I disagree, Its a part of the logic that we are setting to handle long job runners. The same that we do with the 72 hours in example. And ensures the integrty and responsability of the runner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the specific advantages of V8 over any other engine for this use case, but anyhow. This is a completely separate discussion topic.
Not that complex.
Not as obscure as moving cloud-specific code to the CML repository instead of abstracting over signals, in my opinion.
Do we really want to support this use case? What's the advantage of supporting it if we don't do the same for the other providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All advantages despite that you disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CML repository is a runner also that can check whatever is needed to ensure the feature it offers. Transparent spot instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to check also the noRetry... pass it.. create another service or whatever, business logic broken in multiple parts ... Ugly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im telling you a solution to avoid Golang source code that Im planning to also unify the base source code that we were discussing