-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self serve replication SQL API #226
base: main
Are you sure you want to change the base?
Conversation
ad12bbd
to
a6d32ec
Compare
a6d32ec
to
1f7d13d
Compare
.../java/com/linkedin/openhouse/spark/statementtest/SetTableReplicationPolicyStatementTest.java
Show resolved
Hide resolved
...ala/com/linkedin/openhouse/spark/sql/execution/datasources/v2/SetReplicationPolicyExec.scala
Show resolved
Hide resolved
...ala/com/linkedin/openhouse/spark/sql/execution/datasources/v2/SetReplicationPolicyExec.scala
Show resolved
Hide resolved
...ain/scala/com/linkedin/openhouse/spark/sql/catalyst/plans/logical/SetReplicationPolicy.scala
Show resolved
Hide resolved
Thank you Selena for quick turnaround on PR. Can you expand on the testing section if local docker tests or test cluster setup was done? |
@rohitkum2506 updated description with local docker spark-shell testing 👍 |
ds = | ||
spark.sql( | ||
"ALTER TABLE openhouse.db.table SET POLICY (REPLICATION = " | ||
+ "({cluster:'a', interval:'b'}, {cluster:'aa', interval:'bb'}))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for server side, can we add a regex validator for the user provided input? cc: @rohitkum2506
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, the validations will be done on the server side
Summary
Branched off from #220, this PR contains only the scope for SQL API support for self serve replication. The changes include SQL API support for adding replication configs to table policies within table properties.
SQL API that is supported:
where interval is defined as the interval at which the replication job is run and cluster is the destination cluster.
Interval is an optional parameter where users can define an interval from 12 to 72. If interval is not given, the replication schedule will be set up as daily (24h intervals).
We also allow a list input with multiple clusters to enable multi-cluster table replication.
Future Scope:
Add validations to check that the destination cluster != source cluster, and that the replication interval follows rules defined for data freshness and compliance.
Separate PR for server-side implementation: #227 which will contain validation for SQL string inputs.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Added unit tests.
Ran following commands on local docker:
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.