-
Notifications
You must be signed in to change notification settings - Fork 61
[WRK-814] Client side impl Sandbox.resource_usage() #2966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WRK-814] Client side impl Sandbox.resource_usage() #2966
Conversation
c3df351
to
94a1ed7
Compare
94a1ed7
to
71747b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It might be good to clarify in the comments/docstring that the returned CPU and memory values are max(used, requested)
# Memory usage is in gibibyte-nanoseconds. Using 1 GiB for 1 second is 1e9 gib ns. | ||
# Using 0.5 GiB for 1 second is 5e8 gib ns. | ||
mem_gib_nanosecs: int | ||
# GPU type, e.g. "a10g". Combine with `gpu_nanosecs` to understand billable GPU spend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we have a many-to-one mapping for some GPU types on configuration (e.g. we accept "A10"
and "A10g"
, plus we're case-insensitive, etc. How are we defining the schema for GPU type here? We've broken our own internal tooling recently with some migrations; are we now obligated to keep the current schema fixed moving forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think it's possible we do modify them. Unlikely but possible.
I'm inclined to drop the field then and make the user maintain a mapping from sandbox ID -> their own GPU type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a canonical map in the backend, could just use that one. But I'm also fine not exposing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense to me in general but do sandboxes have gpu fallbacks? or can you only request one gpu type?
|
||
# CPU usage is in core-nanoseconds. Using 1 core for 1 second is 1e9 core ns. | ||
# Using 0.5 core for 1 second is 5e8 core ns. | ||
cpu_core_nanosecs: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random ignorable suggestion but nanosecs
is super non-human-friendly, maybe the dataclass could have a method that presents the underlying data in a more human-readable format to save people some hassle when they're using it interactively or printing it or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep good idea, I can do a nice repr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe not repr
, I like this behavior:
Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I've never thought that notion of repr()
was very useful, but yeah I think you could do it with __str__
instead of __repr__
, or some named method that returns a string. I'd probably try to format the CPU/RAM/GPU usage with SI prefixes so it's readable across multiple OOMs.
oh missed that as I haven’t looked closely at the backend pr, but would argue that maybe suggests a different name, not just documentation (which is also good to add). |
I think different people might really want to know both though (what did it actually use, eg for profiling, and what did we bill for, eg for passing on costs). |
And ideally we wouldn’t expose multiple variants of similar information through multiple different ad hoc methods :/. Maybe an |
Like Hmm I don't love it. I didn't realize that we applied compute resource multipliers because of region scheduling. I'd otherwise be OK with passing through the reserved value as-is, because I don't think this endpoint can ever serve as profiling data. I'll leave this for a bit and think about it. |
I am just suggesting that we allow users to query
If you do that IMO I'd say it's mandatory to name the method differently; how can we describe the billing as applying to
Why not? I'd certainly try to use it for something like that based on how it's currently presented, and I expect other users would too. |
Describe your changes
Check these boxes or delete any item (or this section) if not relevant for this PR.
Note on protobuf: protobuf message changes in one place may have impact to
multiple entities (client, server, worker, database). See points above.
Changelog
Sandbox.resource_usage()
method to retrieve billable resource usage (CPU, RAM, GPU time).