-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination Snowflake: Improve error handling in StagingClient #39135
Conversation
The latest updates on your projects. Learn more about Vercel for Git βοΈ
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
393d5f1
to
f94814f
Compare
f94814f
to
c754c40
Compare
c754c40
to
459c37c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is SnowflakeStagingClientTest still useful, now that there's a proper integration test?
(two minor comments, neither blocking)
@@ -84,7 +111,8 @@ class SnowflakeStagingClient(private val database: JdbcDatabase) { | |||
filePath, | |||
stageName, | |||
stagingPath, | |||
Runtime.getRuntime().availableProcessors() | |||
// max allowed param is 99, we don't need so many threads for a single file upload | |||
minOf(Runtime.getRuntime().availableProcessors(), 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not minOf(..., 99)
? wouldn't this restrict us to 4 threads in most cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our cloud case it is always 1, our pod config is 1 cpu i believe based on my previous dev release run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gisripa
The pod config could also run on 2 cpus by values.yaml definition ain't im wrong? (worker resources)
I couldn't understand why not use minOf(..., 99)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@talnidam surely can use minOf(..,99)
however there is no added benefit in the way we upload 1 file only. We use a 200M file and only 1 file per PUT and snowflake chunks it if > 64M which I'm guessing won't be more than 3 or 4 chunks. The advantage of this shows up if we do PUT with a directory/*
with many files in the directory. Also the default is 4 in snowflake settings if not provided. so just used that as max.
private val datasource = | ||
SnowflakeDatabaseUtils.createDataSource(config, OssCloudEnvVarConsts.AIRBYTE_OSS) | ||
private val database: JdbcDatabase = SnowflakeDatabaseUtils.getDatabase(datasource) | ||
// Intentionally not using actual columns, since the staging client should be agnostic of these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random thought: do we eventually want to explicitly specify column names in our COPY
command? π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so.. it requires COPY INTO .. select filename#colindex
or something i think right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like that π€· not sure about the exact syntax, presumably there's some way to specify "please load into exactly these columns in the target table"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which I guess is actually slightly different from what you have (i.e. "read these columns out of the file" vs "write to these columns in the table")
Yeah not useful for happy path, only useful part was if the correct exception caught and thrown as ConfigError. |
What
Use the returned results to determine success/failure for
PUT
andCOPY INTO
commands.Review guide
User Impact
Can this PR be safely reverted and rolled back?