feat(proxy): Introduce an S3-aware proxy mode in dfdaemon#1779
Open
YQ-Wang wants to merge 1 commit intodragonflyoss:mainfrom
Open
feat(proxy): Introduce an S3-aware proxy mode in dfdaemon#1779YQ-Wang wants to merge 1 commit intodragonflyoss:mainfrom
YQ-Wang wants to merge 1 commit intodragonflyoss:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1779 +/- ##
==========================================
+ Coverage 44.54% 46.39% +1.84%
==========================================
Files 91 92 +1
Lines 25834 26813 +979
==========================================
+ Hits 11508 12440 +932
- Misses 14326 14373 +47
🚀 New features to boost your workflow:
|
Signed-off-by: Yiqing Wang <yiqingwang@roblox.com>
99d127c to
3de8859
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add an S3-aware proxy mode to
dfdaemonthat classifies incoming S3 requests and routes themaccordingly:
GetObject(full and ranged): accelerated through Dragonfly P2P. If the caller's SigV4signature explicitly covers the
Rangeheader,dfdaemonpreserves that original signedRangeon the source request instead of rewriting it.HeadObjectandListObjectsV2: direct passthrough.Key implementation details:
proxy::s3module for request classification (virtual-hosted and path-style buckets,configurable host allowlist, automatic AWS endpoint detection).
X-Amz-Credential,X-Amz-Date,X-Amz-Expires,X-Amz-Signature,X-Amz-Security-Token,X-Amz-SignedHeaders,X-Amz-Algorithm) so different presigned URLs for the same objectshare the same P2P task.
proxy::headerhelpers for SigV4 signed-header detection and range preservation signalingvia internal
X-Dragonfly-Preserve-Original-Range-For-Sourceheader.dfdaemon_downloadmarks signed-range requests so downstream resource handling preserves thecaller's original signed
Rangeon source requests.resource::taskgenerates deterministic FNV-1a piece numbers for signed-range pieces to avoidcollisions with standard sequential piece numbering.
resource::piececonditionally omits theRangeheader on source GET requests when thepreserve-original-range flag is set, so the source returns the full signed range as-is.
dragonfly_client_s3_proxy_request_total) for per-operation, per-routecounters.
ci/dfdaemon.servicecomments, and doctest fixes indragonfly-client-util(unrelated pre-existing breakage).Related Issue
Motivation and Context
Organizations storing large ML models, datasets, or media files in S3 and distributing them
across many hosts cannot leverage Dragonfly P2P today without rewriting download logic to use
Dragonfly's API directly. With S3-aware proxy mode, those workloads are accelerated transparently:
applications only need to set
HTTPS_PROXYand trust the proxy CA — no code changes required.The proxy preserves caller-provided SigV4 headers and presigned URLs end-to-end.
dfdaemondoesnot discover AWS credentials or re-sign origin requests, keeping the security model unchanged.
Testing
Tested end-to-end against a real S3 object (
s3://bucket-name/test_dragonfly/file.mp4,~50 MB) through a locally deployed Dragonfly cluster (manager, scheduler, seed-client, client via
Docker Compose). All tests ran inside the client container with
HTTPS_PROXYandAWS_CA_BUNDLEconfigured to route traffic through the patched
dfdaemon.boto3 test matrix (all passed with correct proxy routing confirmed via
dfdaemonlogs):s3.head_object()s3.list_objects_v2()s3.get_object()s3.get_object(Range='bytes=0-1023')s3.get_object(Range='bytes=1048576-2097151')s3.download_file()s5cmd tests (all passed):
s5cmd cp s3://... .s5cmd cp --part-size 5 --concurrency 3 s3://... .Presigned URL test: Tested a regional presigned URL (
bucket-name.s3.us-east-1.amazonaws.com) for both full GET and ranged GET — both routed correctly.Recursive folder download:
s3://bucket-name/test_dragonfly/with nested subdirectories — all objects downloaded with correct sizes and routing.cargo test: Passed after fixing unrelated pre-existing doctest failures in
dragonfly-client-util.