Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Snowflake: Sync Id, generation_id and Meta #39107

Merged
merged 1 commit into from
Jun 10, 2024

Conversation

gisripa
Copy link
Contributor

@gisripa gisripa commented Jun 4, 2024

What

Adding migration for Snowflake Raw tables and final tables to include _airbyte_meta and _airbyte_generation_id columns.
Tests and refactors to adapt to latest CDK.

Review guide

User Impact

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Jun 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 10, 2024 11:31pm

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Jun 4, 2024
Copy link
Contributor Author

gisripa commented Jun 4, 2024

@gisripa gisripa changed the title snowflake-genid-meta Destination Snowflake: Sync Id, generation_id and Meta Jun 4, 2024
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable, added some nitpicky comments but wouldn't block on any of them. Lmk if you want to ship this pr standalone and I'll approve 👍

@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from f07e155 to 925e6fe Compare June 5, 2024 16:56
@gisripa gisripa force-pushed the gireesh/06-04-cdk-fix-zerobyte-flush branch from aa96c15 to 5997f8c Compare June 5, 2024 17:01
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from 925e6fe to 7afc0ca Compare June 5, 2024 17:01
Base automatically changed from gireesh/06-04-cdk-fix-zerobyte-flush to master June 5, 2024 17:09
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch 2 times, most recently from b19a5c5 to 0b6a891 Compare June 5, 2024 17:48
@gisripa gisripa changed the base branch from master to gireesh/snowflake-stagingclient-enh June 5, 2024 17:48
@gisripa gisripa force-pushed the gireesh/snowflake-stagingclient-enh branch from 393d5f1 to f94814f Compare June 5, 2024 19:03
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from 0b6a891 to 88c29a5 Compare June 5, 2024 19:04
@gisripa gisripa force-pushed the gireesh/snowflake-stagingclient-enh branch from f94814f to c754c40 Compare June 5, 2024 19:55
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from 88c29a5 to d94131a Compare June 5, 2024 19:55
@gisripa gisripa force-pushed the gireesh/snowflake-stagingclient-enh branch from c754c40 to 459c37c Compare June 5, 2024 20:34
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from d94131a to 8f94ff2 Compare June 5, 2024 20:34
Base automatically changed from gireesh/snowflake-stagingclient-enh to master June 5, 2024 22:08
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch 7 times, most recently from c46d15a to 74c83db Compare June 7, 2024 20:18
@gisripa gisripa changed the base branch from master to edgao/catalog_parser_improvements June 7, 2024 20:18
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from 74c83db to c0a5ee5 Compare June 7, 2024 20:51
@gisripa gisripa marked this pull request as ready for review June 7, 2024 20:52
@gisripa gisripa requested a review from a team as a code owner June 7, 2024 20:52
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from c0a5ee5 to 3180ccc Compare June 7, 2024 20:57
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple nitpicks, otherwise lgtm

)
val rawTableDefinition =
results
.groupBy { it.get("schema_name").asText()!! }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're already restricting to specifically airbyte_internal.whatever, why do we need to group by schema name / table name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to avoid. list has 1 element and first() call i guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you get the same result with this?

TableDefinition(
            results.associateTo(LinkedHashMap()) {
                // return value of data_type in show columns is a json string.
                val dataType = Jsons.deserialize(it.get("data_type").asText())
                it.get("column_name").asText()!! to
                    ColumnDefinition(
                        it.get("column_name").asText(),
                        dataType.get("type").asText(),
                        0,
                        dataType.get("nullable").asBoolean(),
                    )
            }
        )

i.e. don't need to group on schema/table name at all, just directly build the map of column name to definition

(... but also, I would hope that kotlin has some easy .first accessor :P )

@edgao edgao force-pushed the edgao/catalog_parser_improvements branch from 7910b55 to 1638b6d Compare June 7, 2024 23:12
@edgao edgao requested a review from a team as a code owner June 7, 2024 23:12
@gisripa gisripa force-pushed the edgao/catalog_parser_improvements branch from 1638b6d to 7910b55 Compare June 8, 2024 02:38
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from 3180ccc to fb50cbf Compare June 8, 2024 02:39
@edgao edgao force-pushed the edgao/catalog_parser_improvements branch from 7910b55 to feba86e Compare June 10, 2024 16:13
Base automatically changed from edgao/catalog_parser_improvements to master June 10, 2024 16:40
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch 4 times, most recently from 1f3e971 to e728b2c Compare June 10, 2024 19:23
@gisripa gisripa force-pushed the gireesh/snowflake-meta-genid branch from e728b2c to adecfad Compare June 10, 2024 23:27
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jun 10, 2024
@gisripa gisripa merged commit 9f0ce4f into master Jun 10, 2024
31 checks passed
@gisripa gisripa deleted the gireesh/snowflake-meta-genid branch June 10, 2024 23:55
@oskarthansen
Copy link

oskarthansen commented Jun 11, 2024

It seems like this update makes "full refresh overwrite" fail..
image

Error message:

JavaScript execution error: Uncaught Execution of multiple statements failed on statement "INSERT INTO "?"."LINKEDINADS..." (at line 2, position 0).
SQL compilation error: error line 76 at position 12
invalid identifier '"_?_meta"' in SYSTEM$MULTISTMT at '    throw `Execution of multiple statements failed on statement {0} (at line {1}, position {2}).`.replace('{1}', LINES[i])' position 4
stackstrace: 
SYSTEM$MULTISTMT line: 10

@pKorsholm
Copy link

Hi @gisripa
This pr seems to just have broken a lot of snowflake destinations causing the following error:

Java.lang.RuntimeException: SQL compilation error: error line 63 at position 12 invalid identifier '"_airbyte_meta"' in SYSTEM$MULTISTMT at ' throw `Execution of multiple statements failed on statement {0} (at line {1}, position {2}).`.replace('{1}', LINES[i])' position 4 stackstrace: SYSTEM$MULTISTMT line: 10

(if you have a remedy for this we could implement that would be great)

@edgao
Copy link
Contributor

edgao commented Jun 11, 2024

@pKorsholm / @tuday2 we believe this only affects overwrite streams - can you confirm?

@tuday2
Copy link

tuday2 commented Jun 11, 2024

For me its "Full Refresh / Overwrite" and "Incremental | Append + Deduped", i have at least 20+ examples

@edgao
Copy link
Contributor

edgao commented Jun 11, 2024

can you send the logs for one of the dedup streams running into problems? (feel free to slack me if you don't want to post them publicly)

so far we've only seen this happen on overwrite streams, so I'm not sure what's happening there

@edgao
Copy link
Contributor

edgao commented Jun 11, 2024

in the meantime, we have #39399, which should fix for overwrite streams. It's just pending green CI.

@edgao
Copy link
Contributor

edgao commented Jun 11, 2024

for vis: @tuday2 sent me the logs - thanks! confirmed that the streams with errors were all full refresh overwrite, which caused the UI to mark other streams as failed (b/c the entire sync terminated uncleanly)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/snowflake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants