-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow negative --idle-timeout on cml-runner #700
Conversation
...to convey no timeout
Co-authored-by: Casper da Costa-Luis <casper.dcl@physics.org>
should we also change |
@casperdcl, do you intend to automate the default value to detect if it has been launched by humans or if it's just an ad-hoc runner launched by a previous step in a workflow? It looks like a good idea, but having too many different defaults could also confuse users. |
We already do custom checks for e.g. "Seconds to wait for jobs before shutting down. Default: 300 in CI or -1 (disabled) otherwise." |
Yes, CML already has a fair amount of moving parts to infer whether it's running on a CI/CD platform or not, which platform it's running on and quite a few other things that try to offload as much choices as possible from users. This could be nice if it's what users expect, but promising a default value when running |
It wouldn't be the first time we take a choice based on the environment, but altering a default value still is a bit risky, even if we document it. |
I'm inclined to think that the cloud use case is the prevalent one and the few users that want to use Is that extra option on the documentation verbose enough to be replaced by "smart" code? iterative/cml.dev#93 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 lgtm
@DavidGOrtega what about #700 (comment) / #700 (comment)? |
@casperdcl I think is a nice idea butI wonder how many breaks introduces. As I stated I launch self hosted GPU runners (2 titans) and expect to shutdown the machine after no more jobs. Its not a big deal, but that makes me think that the change is not really needed since:
|
See iterative/cml.dev#91 (comment); using
0
to conveydisabled
is not especially friendly, numerically speaking.