-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Dynamo to S3 Sample DAG and Docs #21920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
potiuk
merged 4 commits into
apache:main
from
ferruzzi:ferruzzi/docs-update/dynamo-to-s3
Mar 8, 2022
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
43 changes: 43 additions & 0 deletions
43
airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| from datetime import datetime | ||
| from os import environ | ||
|
|
||
| from airflow import DAG | ||
| from airflow.providers.amazon.aws.transfers.dynamodb_to_s3 import DynamoDBToS3Operator | ||
|
|
||
| TABLE_NAME = environ.get('DYNAMO_TABLE_NAME', 'ExistingDynamoDbTableName') | ||
| BUCKET_NAME = environ.get('S3_BUCKET_NAME', 'ExistingS3BucketName') | ||
|
|
||
|
|
||
| with DAG( | ||
| dag_id='example_dynamodb_to_s3', | ||
| schedule_interval=None, | ||
| start_date=datetime(2021, 1, 1), | ||
| tags=['example'], | ||
| catchup=False, | ||
| ) as dag: | ||
|
|
||
| # [START howto_transfer_dynamodb_to_s3] | ||
| backup_db = DynamoDBToS3Operator( | ||
| task_id='backup_db', | ||
| dynamodb_table_name=TABLE_NAME, | ||
| s3_bucket_name=BUCKET_NAME, | ||
| # Max output file size in bytes. If the Table is too large, multiple files will be created. | ||
| file_size=1000, | ||
| ) | ||
| # [END howto_transfer_dynamodb_to_s3] | ||
60 changes: 60 additions & 0 deletions
60
airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3_segmented.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| from datetime import datetime | ||
| from os import environ | ||
|
|
||
| from airflow import DAG | ||
| from airflow.providers.amazon.aws.transfers.dynamodb_to_s3 import DynamoDBToS3Operator | ||
|
|
||
| TABLE_NAME = environ.get('DYNAMO_TABLE_NAME', 'ExistingDynamoDbTableName') | ||
| BUCKET_NAME = environ.get('S3_BUCKET_NAME', 'ExistingS3BucketName') | ||
|
|
||
|
|
||
| with DAG( | ||
| dag_id='example_dynamodb_to_s3_segmented', | ||
| schedule_interval=None, | ||
| start_date=datetime(2021, 1, 1), | ||
| tags=['example'], | ||
| catchup=False, | ||
| ) as dag: | ||
|
|
||
| # [START howto_transfer_dynamodb_to_s3_segmented] | ||
| # Segmenting allows the transfer to be parallelized into {segment} number of parallel tasks. | ||
| backup_db_segment_1 = DynamoDBToS3Operator( | ||
| task_id='backup-1', | ||
| dynamodb_table_name=TABLE_NAME, | ||
| s3_bucket_name=BUCKET_NAME, | ||
| # Max output file size in bytes. If the Table is too large, multiple files will be created. | ||
| file_size=1000, | ||
| dynamodb_scan_kwargs={ | ||
| "TotalSegments": 2, | ||
| "Segment": 0, | ||
| }, | ||
| ) | ||
|
|
||
| backup_db_segment_2 = DynamoDBToS3Operator( | ||
| task_id="backup-2", | ||
| dynamodb_table_name=TABLE_NAME, | ||
| s3_bucket_name=BUCKET_NAME, | ||
| # Max output file size in bytes. If the Table is too large, multiple files will be created. | ||
| file_size=1000, | ||
| dynamodb_scan_kwargs={ | ||
| "TotalSegments": 2, | ||
| "Segment": 1, | ||
| }, | ||
| ) | ||
| # [END howto_transfer_dynamodb_to_s3_segmented] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
docs/apache-airflow-providers-amazon/operators/transfer/dynamodb_to_s3.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| .. Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| .. http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| .. Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
|
|
||
|
|
||
| Amazon DynamoDB to S3 Transfer Operator | ||
| ======================================= | ||
|
|
||
| Use the DynamoDBToS3Operator transfer to copy the contents of an existing Amazon DynamoDB table | ||
| to an existing Amazon Simple Storage Service (S3) bucket. | ||
|
|
||
| Prerequisite Tasks | ||
| ^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| .. include:: ../_partials/prerequisite_tasks.rst | ||
|
|
||
| .. _howto/transfer:DynamoDBToS3Operator: | ||
|
|
||
| DynamoDB To S3 Operator | ||
| ^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| This operator replicates records from a DynamoDB table to a file in an S3 bucket. | ||
| It scans a DynamoDB table and writes the received records to a file on the local | ||
| filesystem. It flushes the file to S3 once the file size exceeds the file size limit | ||
| specified by the user. | ||
|
|
||
| Users can also specify a filtering criteria using dynamodb_scan_kwargs to only replicate | ||
| records that satisfy the criteria. | ||
|
|
||
| To get more information visit: | ||
| :class:`~airflow.providers.amazon.aws.transfers.dynamodb_to_s3.DynamoDBToS3Operator` | ||
|
|
||
| Example usage: | ||
|
|
||
| .. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3.py | ||
| :language: python | ||
| :dedent: 4 | ||
| :start-after: [START howto_transfer_dynamodb_to_s3] | ||
| :end-before: [END howto_transfer_dynamodb_to_s3] | ||
|
|
||
| To parallelize the replication, users can create multiple DynamoDBToS3Operator tasks using the | ||
| ``TotalSegments`` parameter. For instance to replicate with parallelism of 2, create two tasks: | ||
|
|
||
| .. exampleinclude:: /../../airflow/providers/amazon/aws/example_dags/example_dynamodb_to_s3_segmented.py | ||
| :language: python | ||
| :dedent: 4 | ||
| :start-after: [START howto_transfer_dynamodb_to_s3_segmented] | ||
| :end-before: [END howto_transfer_dynamodb_to_s3_segmented] |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extend the example/docs to include also the segment explanation?
airflow/airflow/providers/amazon/aws/transfers/dynamodb_to_s3.py
Lines 65 to 87 in 602abe8
(and remove the segment example from the comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changed should be in b0278a1