Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: add DynamoDb catalog #2688

Merged
merged 2 commits into from
Jun 23, 2021
Merged

Conversation

jackye1995
Copy link
Contributor

@jackye1995 jackye1995 commented Jun 9, 2021

Add DynamoDB catalog implementation, with the following specifications:

  1. identifier column (partition key): table identifier string, or NAMESPACE for namespaces
  2. namespace column (sort key): namespace string
  3. a global secondary index (GSI) with namespace as partition key, identifier as sort key, no other projected columns
  4. version column : UUID string, used for optimistic locking
  5. updated_at column : timestamp long, used to record latest update time
  6. created_at column : timestamp long, used to record initial create time
  7. p.[property_key] column : string, used to store properties (namespace property or Iceberg-defined table properties including table_type, metadata_location and previous_metadata_location)

This design has the following benefits:

  1. table name is used directly as partition key to avoid any potential hot partition issue, comparing to use namespace as partition key and table name as sort key
  2. namespace operations are clustered in a single partition to avoid affecting table commit operations
  3. a reverse GSI is used for list table operation, and all other operations are single row ops or single partition query. No full table scan is needed for any operation in the catalog.
  4. a string UUID version field is used instead of updated_at to avoid 2 processes committing at the same millisecond
  5. multi-row transaction is used for renameTable to ensure idempotency
  6. properties are flattened as top level columns so that user can add custom GSI on any property field.

Limitations:

  1. To avoid complications in parsing namespace, dot (.) is not allowed in any level of namespace
  2. Similarly, to avoid complications in parsing table identifier, dot is not allowed in table name.

@yyanyy @rdblue @SreeramGarlapati @johnclara @danielcweeks

@github-actions github-actions bot added the AWS label Jun 9, 2021
@jackye1995
Copy link
Contributor Author

@danielcweeks any additional comments?

@danielcweeks
Copy link
Contributor

This looks good to me. Just the two small updates (docs and empty line) and I'm happy to commit.

@jackye1995
Copy link
Contributor Author

@danielcweeks thank you, updated!

@danielcweeks danielcweeks merged commit f81d8ad into apache:master Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants