Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ability to query table and column stats; fix: missing bookmarks for native batch-based sync #13

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Jan 2, 2023

I took an initial pass at this over the vacation. I don't know if I'll be able to finish it out, but we can at least start the discussions here.

Resolves:

Besides the new "Table and Column Stats" implementation via SQLConnector.get_table_profile(), the meat of the change is in SQLStream._sync_batches(). The proposed implementation queries for max column value before starting the batch unload, then after the load finishes, it increments the state with the pre-queried max value.

Function signatures did not need to change, and all of this fix could be implemented into the SDK with no target-specific code needing to be written, except what is already done in this implementation, which is to use the bookmark's value (when it is available) in the WHERE clause of the COPY operation.

.devcontainer/devcontainer.json Show resolved Hide resolved
tap_snowflake/client.py Show resolved Hide resolved
tap_snowflake/client.py Show resolved Hide resolved
@aaronsteers aaronsteers changed the title fix missing bookmarks for native batch-based sync feat: query table and column stats; fix: missing bookmarks for native batch-based sync Jan 2, 2023
@aaronsteers aaronsteers changed the title feat: query table and column stats; fix: missing bookmarks for native batch-based sync feat: ability to query table and column stats; \n fix: missing bookmarks for native batch-based sync Jan 2, 2023
@aaronsteers aaronsteers changed the title feat: ability to query table and column stats; \n fix: missing bookmarks for native batch-based sync feat: ability to query table and column stats; fix: missing bookmarks for native batch-based sync Jan 2, 2023
@aaronsteers aaronsteers marked this pull request as draft January 2, 2023 19:40
.devcontainer/devcontainer.json Show resolved Hide resolved
tap_snowflake/client.py Outdated Show resolved Hide resolved
tap_snowflake/client.py Show resolved Hide resolved
Co-authored-by: Edgar R. M. <edgarrm358@gmail.com>
@kgpayne kgpayne marked this pull request as ready for review February 2, 2023 17:20
@tayloramurphy
Copy link

@aaronsteers @kgpayne what needs to happen to get this merged?

@kgpayne
Copy link
Collaborator

kgpayne commented Mar 20, 2023

@tayloramurphy final review from @edgarrmondragon would do it. @aaronsteers can't review his own PR, and I made a bunch of changes to get AJ's excellent first pass to fully work 🙂 Will add my review for the sake of GH ✅

@kgpayne kgpayne merged commit d467ca2 into main Mar 30, 2023
@kgpayne kgpayne deleted the 12-fix-batch-sync-does-not-record-a-max-replication-key-value-in-emitted-state branch March 30, 2023 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix: BATCH sync does not record a max replication key value in emitted STATE
4 participants