-
Notifications
You must be signed in to change notification settings - Fork 987
Fix Postgres OOM by limiting fetch size #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two minor comment but otherwise LGTM
src/main/java/io/confluent/connect/jdbc/source/TableQuerier.java
Outdated
Show resolved
Hide resolved
src/main/java/io/confluent/connect/jdbc/source/JdbcSourceTask.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Greg Harris <gregh@confluent.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work, @gharris1727. some comments/questions.
src/test/java/io/confluent/connect/jdbc/source/integration/PostgresOOMIT.java
Show resolved
Hide resolved
|
||
task = new JdbcSourceTask(); | ||
task.start(props); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to test what happens when the table monitor thread is running in the same JVM as these tasks that are setting auto commit to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table monitor thread is in essentially the same relationship to this task as other tasks from the JDBC connector, and we don't see any issues from this running in production.
This is already effectively tested by running an existing source and sink connector in the same JVM, since sink connectors disable autocommit.
* @param stmt the prepared statement; never null | ||
* @throws SQLException the error that might result from initialization | ||
*/ | ||
@Override | ||
protected void initializePreparedStatement(PreparedStatement stmt) throws SQLException { | ||
log.trace("Initializing PreparedStatement fetch direction to FETCH_FORWARD for '{}'", stmt); | ||
stmt.setFetchDirection(ResultSet.FETCH_FORWARD); | ||
stmt.setFetchSize(config.getInt(JdbcSourceConnectorConfig.BATCH_MAX_ROWS_CONFIG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Postgres specifically, the original issue found that there was a Postgres JDBC URL parameter defaultRowFetchSize
which could be set, but was ineffective since it relied on autoCommit
being disabled, which could not be done with the URL and is only settable in-code.
This change sets fetchSize
in code (unconditionally), based on the existing BATCH_MAX_ROWS_CONFIG
, which will default to 100 if absent.
A couple of questions:
- This is only being added for Postgres, where there is the URL option
defaultRowFetchSize
- do other databases, which may or may not have a JDBC URL fetch-size option, possibly also suffer from this issue? Should we check for any other common databases that could have this issue and no URL option? - Does it matter that this will now always silently-ignore any URL
defaultRowFetchSize
set - might be confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
As far as I understand, MySQL has a similar behavior, except that their url workaround is effective. I'm sure that theres at least one other database that we don't know about that has this same issue, and would require the same fix.
However, without the abillity to integration test this functionality, I don't feel comfortable making changes to those dialects. Even for Postgres, I needed to find a way to integration test this change before I felt comfortable putting this up for PR. The cost to adding integration tests at this time is also higher than we can afford right now, and beyond the scope of this fix.
-
If someone included this URL option without knowing, then I don't think they would know/care that it was being ignored. Therefore, I'll assume that someone using this URL option put it there explicitly.
If the URL option is higher than the batch size (perhaps by someone trying to optimize throughput) then it would be effectively the same as if it was equal the batch size except with higher memory usage. This is because the connector will still only read at most
batch.size
of data from the result set regardless of the fetch size.If the URL option was lower than the batch size (perhaps by someone trying to optimize memory) then they would notice that it didn't reduce memory usage. This would be non-obvious, but if they're optimizing memory usage, why would they change this URL parameter before reducing the batch size? Tuning the batch size performs effectively the same memory tuning behavior that they want, but it's actually supported by the connector itself, and documented behavior, rather than depending on this pass-through driver tuning parameter.
src/main/java/io/confluent/connect/jdbc/source/JdbcSourceTask.java
Outdated
Show resolved
Hide resolved
…-nullable check Signed-off-by: Greg Harris <gregh@confluent.io>
Signed-off-by: Greg Harris <gregh@confluent.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remain uneasy about the setFetchSize
logic being applied only for Postgres, or it being only possible to apply for Postgres. If there's a concern that this might not be safe to apply to all databases/drivers, then we could find a way to not auto-apply it from batch.size
(although that logic is attractive). I guess I'm curious - what is the effective batch and fetch size and memory consumption of other databases ... maybe they are automatically using forward-scrolling cursors by-default?
It feels like the autoCommit
change could be safely applied by making this a new option, then any behavioural change could be isolated to situations where it is required, e.g. Postgres, where it would be documented.
I think the change is good and the above thoughts are take-or-leave only.
src/main/java/io/confluent/connect/jdbc/source/TableQuerier.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Greg Harris <gregh@confluent.io>
There are a few bug fixes/features which have added configuration options:
These are just the bug fixes that resulted in a configuration option, but there are many more that were fixed without a configuration option, many more than are listed here. Overall, I think that the only justification for a configuration added as a bug fix is preservation of backwards compatibility, and I don't think that justification works here. Nobody is relying on the OOM exception as part of their workflow, and it has no impact on the actual data that's being transferred. This is just the first dialect to have this fix applied, and applying it to each of them is simple. The only reason we're not adding them at this time is because of the cost of testing. Let me know your thoughts @javabrett, and thanks for your review! |
Hi, when new Docker images includes this fix is published to Docker hub? |
This will not be released until the next patch release, so |
Is this included in the latest confluent 5.4 deb packages? |
@kgrimsby No, this will not be included in the 5.4 deb packages. It will be included in the deb packages of the versions I called out in my previous message. |
My bad @gharris1727 ! Sorry for wasting your time.. |
This PR adds an integration test with an embedded postgres instance.
There are three tests:
Test 3 addresses a side-effect of autoCommit that was called out in #466, where read-only queries with autoCommit disabled would leave open locks on the database. This test verifies that the
JdbcSourceTask
does not leave open locks from any operations.#466 specifically observed that the TableMonitorThread would leave open locks if auto-commit was disabled on that connection. This PR circumvents that issue by leaving the TableMonitorThread connection as-is, with autoCommit enabled. There are no tests covering the Connector or TableMonitorThread, because the behavior of those classes is unaffected by these changes. A connector test would also have some extra complication as the TableMonitorThread is asynchronous.
Fixes #34 .