Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination CDK: Skip flush of zero byte records. #39106

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

gisripa
Copy link
Contributor

@gisripa gisripa commented Jun 4, 2024

What

Related: #31425
Fixes: https://github.com/airbytehq/airbyte-internal-issues/issues/6728

In the old code, the below trace was the culprit in AsyncFlush.kt, fixing it for good in new flow.

2024-03-18 23:37:22 destination > WARN pool-4-thread-1 i.a.c.i.d.s.c.CsvSerializedBuffer(flushWriter):85 Trying to flush but no printer is initialized.
2024-03-18 23:37:22 destination > WARN pool-4-thread-1 i.a.c.i.d.s.c.CsvSerializedBuffer(closeWriter):95 Trying to close but no printer is initialized.
2024-03-18 23:37:22 destination > INFO pool-4-thread-1 i.a.c.i.d.r.BaseSerializedBuffer(flush):172 Finished writing data to c062ec97-5fca-4b9c-815e-dd35430f525b6128245000758765392.csv.gz (0 bytes)

This seems to happen when sources are slow and Async framework issues a flush irrespective of the number of records, leading to the Flush function just initializing empty buffer and doing UPLOAD/COPY with 0 byte files.

How

Review guide

User Impact

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Jun 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Jun 5, 2024 5:01pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/destination/snowflake labels Jun 4, 2024
Copy link
Contributor Author

gisripa commented Jun 4, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @gisripa and the rest of your teammates on Graphite Graphite

@gisripa gisripa changed the title cdk-fix-zerobyte-flush Destination CDK: Skip flush of zero byte records. Jun 4, 2024
@gisripa gisripa force-pushed the gireesh/06-04-cdk-fix-zerobyte-flush branch from 9a3742e to aa96c15 Compare June 4, 2024 23:10
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Jun 4, 2024
@gisripa gisripa marked this pull request as ready for review June 4, 2024 23:18
@gisripa gisripa requested a review from a team as a code owner June 4, 2024 23:18
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 so it really was that easy

does it.flush need to be before the byteCount check? Or can we skip that too?

@gisripa
Copy link
Contributor Author

gisripa commented Jun 5, 2024

🤦 so it really was that easy

does it.flush need to be before the byteCount check? Or can we skip that too?

we still need to do flush because weirdly that call does close of the file stream and all. otherwise we'll end up with open fds and open byte streams 🤦‍♂️

@gisripa gisripa force-pushed the gireesh/06-04-cdk-fix-zerobyte-flush branch from aa96c15 to 5997f8c Compare June 5, 2024 17:01
@gisripa gisripa requested a review from a team as a code owner June 5, 2024 17:01
@gisripa gisripa enabled auto-merge (squash) June 5, 2024 17:02
@gisripa gisripa merged commit 3d8f7ca into master Jun 5, 2024
24 checks passed
@gisripa gisripa deleted the gireesh/06-04-cdk-fix-zerobyte-flush branch June 5, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants