Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self serve replication SQL API #226

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chenselena
Copy link
Collaborator

@chenselena chenselena commented Oct 9, 2024

Summary

Branched off from #220, this PR contains only the scope for SQL API support for self serve replication. The changes include SQL API support for adding replication configs to table policies within table properties.

SQL API that is supported:

ALTER TABLE db.testTable SET POLICY (REPLICATION=({cluster:'a', interval:'b'}))
ALTER TABLE db.testTable SET POLICY (REPLICATION=({cluster:'a'}))

where interval is defined as the interval at which the replication job is run and cluster is the destination cluster.
Interval is an optional parameter where users can define an interval from 12 to 72. If interval is not given, the replication schedule will be set up as daily (24h intervals).

We also allow a list input with multiple clusters to enable multi-cluster table replication.

ALTER TABLE db.testTable SET POLICY (REPLICATION=({cluster:'a', interval:'b'}, {cluster:'aa', interval:'bb'}))

Future Scope:
Add validations to check that the destination cluster != source cluster, and that the replication interval follows rules defined for data freshness and compliance.
Separate PR for server-side implementation: #227 which will contain validation for SQL string inputs.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Added unit tests.

Ran following commands on local docker:

scala> spark.sql("alter table u_tableowner.test_table set policy (replication=({cluster:'WAR'}))").show(false)
ANTLR Tool version 4.7.1 used for code generation does not match the current runtime version 4.8ANTLR Tool version 4.7.1 used for code generation does not match the current runtime version 4.8++
||
++
++
scala> spark.sql("alter table u_tableowner.test_table set policy (replication=({cluster:'WAR', interval:'12H'}))").show(false)
++
||
++
++
scala> spark.sql("alter table u_tableowner.test_table set policy (replication=({interval:'12H'}))").show(false)
com.linkedin.openhouse.spark.sql.catalyst.parser.extensions.OpenhouseParseException: mismatched input 'interval' expecting {'.', 'SET'}; line 1 pos 62
scala> spark.sql("alter table u_tableowner.test_table set policy (replication=({cluster:}))").show(false)
com.linkedin.openhouse.spark.sql.catalyst.parser.extensions.OpenhouseParseException: missing STRING at '}'; line 1 pos 70

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@chenselena chenselena changed the title [Draft] Self serve replication SQL API implementation [Draft] Self serve replication SQL API Oct 10, 2024
@chenselena chenselena changed the title [Draft] Self serve replication SQL API Self serve replication SQL API Oct 11, 2024
@chenselena chenselena marked this pull request as ready for review October 11, 2024 18:05
@rohitkum2506
Copy link
Collaborator

Thank you Selena for quick turnaround on PR. Can you expand on the testing section if local docker tests or test cluster setup was done?

@chenselena
Copy link
Collaborator Author

Thank you Selena for quick turnaround on PR. Can you expand on the testing section if local docker tests or test cluster setup was done?

@rohitkum2506 updated description with local docker spark-shell testing 👍

ds =
spark.sql(
"ALTER TABLE openhouse.db.table SET POLICY (REPLICATION = "
+ "({cluster:'a', interval:'b'}, {cluster:'aa', interval:'bb'}))");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for server side, can we add a regex validator for the user provided input? cc: @rohitkum2506

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the validations will be done on the server side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants