You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. It would be nice to have [...]
Currently, I do not see any way to have a single pipeline consume an s3 source (with sqs) for s3 buckets that are in different regions. It would be nice to have this ability.
Example scenario:
two s3 buckets, one in us-west-2 and us-east-1
each bucket has event notifications configured with an sns topic in their respective regions
a single sqs queue in us-east-1 that is subscribed to both topics above (one topic in us-east-1, another in us-west-2)
With the above configuration, everything goes smoothly for us-east-1. However, the pipeline fails to get objects from the us-west-2 bucket because the s3 client is configured for us-east-1. The (not very informative) error log is: [s3-source-sqs-1] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: null (Service: S3, Status Code: 400, Request ID: xxxx, Extended Request ID: xxxx)
Describe alternatives you've considered (Optional)
Using a pipeline and sqs queue for each bucket that is in a different region. But this feels silly - extra sqs queue, pipeline, and duplicated configuration.
The text was updated successfully, but these errors were encountered:
@brianmaresca , Thank you for creating this detailed issue. It seems you are familiar with the solution. Would you be interested in creating a PR contribution for it?
I'm interested in solving this by using the region information from the SQS queue. This can allow us to avoid two calls to the S3 API to load the data.
Additionally, we can perform STS authentication for the desired region.
Currently we use the region defined in the region property. e.g.
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. It would be nice to have [...]
Currently, I do not see any way to have a single pipeline consume an s3 source (with sqs) for s3 buckets that are in different regions. It would be nice to have this ability.
Example scenario:
With the above configuration, everything goes smoothly for us-east-1. However, the pipeline fails to get objects from the us-west-2 bucket because the s3 client is configured for us-east-1. The (not very informative) error log is:
[s3-source-sqs-1] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: null (Service: S3, Status Code: 400, Request ID: xxxx, Extended Request ID: xxxx)
Describe the solution you'd like
Enable (or the option to enable) cross region access on the S3 client so it is able to download objects from buckets in regions other than the one defined in the yaml config. See https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/s3-cross-region.html.
potential solution, add
.crossRegionAccessEnabled()
tocreateS3Client
inS3ClientBuilderFactory
:Describe alternatives you've considered (Optional)
Using a pipeline and sqs queue for each bucket that is in a different region. But this feels silly - extra sqs queue, pipeline, and duplicated configuration.
The text was updated successfully, but these errors were encountered: