Skip to content

Conversation

@Eta0
Copy link
Collaborator

@Eta0 Eta0 commented May 29, 2025

Object storage config handling improvements & improved CAIOS support

These changes improve the compatibility of the stream_io.open_stream() object storage feature with different types of object storage configurations. The changes are summarized in the additions to CHANGELOG.md, duplicated below.
This also fixes a bug with very large writes on some non-Linux platforms, and resolves an incomplete fix for open_stream object storage uploads from #72.

Added

  • stream_io.open_stream() now respects Boto3's configuration files and environment variables when searching for object storage credentials to use

Fixed

  • stream_io.open_stream() now uses virtual-hosted-style bucket addressing for the cwobject.com and cwlota.com endpoints
  • stream_io.open_stream() now allows the use_https entry of .s3cfg configuration files to fill in its force_http parameter if force_http is not explicitly specified as True or False
  • TensorSerializer no longer throws an error when attempting to serialize very large tensors on some non-Linux platforms
  • Object storage uploads managed by stream_io.open_stream() now finalize correctly on Python 3.12+ even without an explicit call to their close() method
    • A fix for this was originally implemented in release 2.7.2, but it only worked for Python versions below 3.12

@Eta0 Eta0 self-assigned this May 29, 2025
@Eta0 Eta0 added bug Something isn't working enhancement New feature or request labels May 29, 2025
@Eta0 Eta0 requested a review from sangstar May 30, 2025 15:33
sangstar
sangstar previously approved these changes Jun 3, 2025
Copy link
Contributor

@sangstar sangstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Test cases passing on my end.

Some systems have a write size limit of around 2^31 - 1,
so this limits the size of any os.pwrite call to be less than that
on platforms other than Linux.
@Eta0 Eta0 requested review from arsenetar and sangstar June 3, 2025 23:08
@Eta0 Eta0 marked this pull request as ready for review June 3, 2025 23:10

def _is_caios(endpoint: str) -> bool:
host = urlparse(endpoint if "//" in endpoint else "//" + endpoint).netloc
return host.lower() in {"cwobject.com", "cwlota.com"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not a fan of us hardcoding these like this (or the ord1 defaults we have had elsewhere for some time). But can be convinced this is ok.

For reference to those other hardcoded values that should be either A) dropped, or B) changed to our current object storage offering.

default_s3_read_endpoint = "accel-object.ord1.coreweave.com"
default_s3_write_endpoint = "object.ord1.coreweave.com"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure in boto3's source tree there is a large data segment that has these sorts of hardcoded compatibility checks for a huge set of different AWS endpoints, which is how Boto3's "magically pick the right one" feature works. Unless/until we upstream a change to support our object storage endpoints correctly, which I'm not sure they'd accept, this seems to me like a fair place to put a workaround.

The other hardcoded values are necessary for the default s3://tensorized bucket to "just work" since they're in that location. I'd like to change it to the newer endpoint in release 3.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a pattern already used in boto3 then probably fine... although we are not automatically changing the force_http based on if we are using cwlota.com or not which would fall into the same pattern as what we are doing here. Same points apply on both sides there as well as if a user configures it correctly it will work, but by default with a minimal amount of configuring it will not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am nervous about defaulting force_http=True for the cwlota.com endpoint on the basis that it could transmit a presigned URL as part of a request over an unknown network in cleartext if it is used somewhere where cwlota.com doesn't resolve to a local network (so not a CW datacenter, maybe from someone's own machine where they're trying to test something), although I think the presigned URL would only be valid for use by someone who can access cwlota.com endpoints. I'd rather leave that one specifically as something that needs to be declared explicitly in the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it works correctly when http://cwlota.com is in the config or environment variables as the endpoint then I don't strongly think it needs to have something explicitly forcing it here.

sangstar
sangstar previously approved these changes Jun 4, 2025
Copy link
Contributor

@sangstar sangstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes confirmed as working on my end once again, LGTM. Will defer to Andrew and his review comments.

@Eta0
Copy link
Collaborator Author

Eta0 commented Jun 5, 2025

This now allows using HTTP instead of HTTPS when using an endpoint parsed from a Boto3 config file or its environment variables. Please take another look. (There have also a couple bug fixes added unrelated to the original scope of this PR; don't mind those too much—they work.)

@Eta0 Eta0 requested review from arsenetar and sangstar June 5, 2025 02:06
Copy link
Contributor

@sangstar sangstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another review and another approval. :)

LGTM and working on my end. Not opinionated on the HTTP vs HTTPS default handling and the implementation looks good so I will defer that once again to Andrew.

Copy link
Contributor

@arsenetar arsenetar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is behaving like I expect it to, granted not 100% validated e2e.

@Eta0 Eta0 merged commit 9b7ba4d into main Jun 5, 2025
2 checks passed
@Eta0 Eta0 deleted the eta/object-storage branch June 5, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants