-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect unsynchronized contract table and retry #10617
Conversation
- Writing only the latest-read, as before, would imply unsynced offsets that are actually well-synced. This puts the DB in a more uniform state, i.e. it should actually reflect the single value that the fetchAndPersist loop tries to catch everything up to.
- Treating every unsynced offset as a lag would make us needlessly retry perfectly synchronized query results.
…tect-contract-table-desync
…tect-contract-table-desync
…tect-contract-table-desync
…tect-contract-table-desync
…tect-contract-table-desync
…tect-contract-table-desync
CHANGELOG_BEGIN - [JSON API] Under rare conditions, a multi-template query backed by database could have an ACS portion that doesn't match its transaction stream, if updated concurrently. This conditions is now checked and accounted for. See `issue #10617 <https://github.com/digital-asset/daml/pull/10617>`__. CHANGELOG_END
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this impact performance? Is there some visible effect on existing performance tests?
val maxLocalOff = partyOffs | ||
.collect { | ||
case (unsyncedParty, unsyncedOff) | ||
if queriedParty(unsyncedParty) || unsyncedOff > expectedOffset => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should be the behaviour for an unsyncedParty
which is not a queriedParty
and has unsyncedOff < expectedOffset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can safely ignore it. This is the test "not require update for lagging, but unqueried, party"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking of it as an optimization, but in reality it's a must-have. If an unqueried party is lagging, we don't have the token to update it, so if we decided to update based on that, we'd get stuck in the loop and exhaust retries.
val singleton = Set(surrogateTpId) | ||
// if a queried party is not in the map, we can safely assume that | ||
// it is exactly at expectedOffset | ||
val caughtUp = maxLocalOff <= expectedOffset || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again slightly confused on how maxLocalOff
being < expectedOffset
is considered as caughtUp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the invariant that every queriedParty
is already at least expectedOffset
, but it may not be in the map. Therefore, if the condition you mention holds, then maxLocalOff
must have come from an unqueried party, which is fine as stated in the prior comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put another way, we care only that the queriedParty
is caught up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functionally looks fine to me, thanks for explaining 👍
} | ||
|
||
"report desync, but no updates, if consistent" in { | ||
mvo(3, Map(0 -> everyParty(5), 1 -> everyParty(5))) should ===( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC expectedOffset
should come from one of the queried parties right ? so is this scenario ever possible ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expectedOffset
is only "what we just wrote"; it never comes from reading. So this can happen if all these template-IDs/parties got updated from 3 to 5 after our own fetch.
@stefanobaghino-da There's no visible effect on A couple things on concurrent updates:
|
…tect-contract-table-desync
This PR has been created by a script, which is not very smart and does not have all the context. Please do double-check that the version prefix is correct before merging. @stefanobaghino-da is in charge of this release. Commit log: ``` b4d0031 detect unsynchronized contract table and retry (#10617) 3d779cf [Mutable state cache] Fix initialization offset (#11024) 5458aa8 Switch on NonUnitStatements warning in daml-lf/transaction (#11048) 9641fd5 auth middleware: no print secret (#11050) a885f52 [DPP-417] [DPP-595] Add error code version switching mechanism (#11035) 12e0c72 fix blackduck logic (#11049) eb87b34 kvutils: Add the logging context for ledger state operations. (#11030) b7daa5f Address remaining dependabot alerts (#11045) 5e43f8c es: drop jobs-* indices (#10857) 03203b7 Define encoding/decoding for module imports (#11036) 57a1597 Setting timeoutToleranceMillis to 10 minutes to prevent flakiness (#11043) df59f3f Fix Navigator dependabot alerts (#11044) 6bf45a3 Upgrade Navigator to Webpack 5 (#11040) 80e217e [DPP-622] Add conformance tests that verifies TLSv1.0 and TLSv1 are disabled. (#10983) 626e1fb Small lf value.cids regression fix (found by canton unit tests) (#11032) a4629a4 KV: Ignore daml_lf_1.proto when checking for KV protobuf compatibility (#11021) ee9be65 kvutils: Add metadata to `Err` [KVL-1032] (#10992) 0d3ae6e interface methods: Haskell Typechecker (#11028) e36eb46 Resolve `set-value` to 4.0.1 and above (#11029) 4075624 interface methods: Haskell AST (#11018) 7c1fd50 Bump grunt-browserify (#11026) 5f3f582 Upgrade webpack-dev-server in Navigator (#11025) 91be1e1 Drop matchdep dependency from docs build (#11023) fe9aeff Increase es disk size (#11019) eac7963 LF: Refactor ProtoTest.scala (#11020) e79a30a For the client binding propagate the full original completion [KVL-1112] (#10879) 59ad995 fix buf check (#11014) abc3e66 Increase the tolarance for handover of control in HaCoordinatorSpec (#10997) 19b2bf4 LF: Cosmetic clean-up in the Speedy Compiler (#11015) cb0e41f LF: Add interface support to the Preprocessor (#11013) c33297c Remove `transactionNormalization` flag. (#11010) f5d2135 Check protobuf compatibility of `main` and PR commits w.r.t. previous stable release [KVL-1109] (#10950) bf8b75d interface methods: Add protobuf definitions. (#11005) 88e1430 Make LargeTransactionTest use ValueEnricher, so it can work with normalized transactions coming out of the engine. (#11003) a043926 rotate release duty after 1.17.0-snapshot.20210921.7889.0.1b473c2b (#10972) 35666ca Add MINIMAL pragma for Additive type class (#11001) 8de162b [DPP-586] Upgrade to netty 4.1.67.Final and netty-tcnative-boringssl-static 2.0.40.Final (#10956) 871d03b release 1.18.0-snapshot.20210922.7908.0.ced4a272 (#10998) 721575e [JSON-API] Postgres perf job (#10986) f2d9f07 Release RC2 for SDK 1.17.0 (#10996) ``` Changelog: ``` - [JSON API] Under rare conditions, a multi-template query backed by database could have an ACS portion that doesn't match its transaction stream, if updated concurrently. This conditions is now checked and accounted for. See `issue #10617 <https://github.com/digital-asset/daml/pull/10617>`__. - The OAuth2 Middleware now obfuscates its Client Secret when logging its config. - [Integration Kit] We have added ``loggingContext`` as an implicit parameter to more _kvutils_ trait methods. Implementors may need to do the same in their trait implementations. This can make it easier to log with the appropriate context. java-client-bindings - the original full completion is included in the `CompletionResponse` when available Daml on SQL, Integration Kit, Sandbox: Drop support for TLS 1.0 and 1.1 in Ledger API. ``` CHANGELOG_BEGIN CHANGELOG_END
This PR has been created by a script, which is not very smart and does not have all the context. Please do double-check that the version prefix is correct before merging. @stefanobaghino-da is in charge of this release. Commit log: ``` b4d0031 detect unsynchronized contract table and retry (#10617) 3d779cf [Mutable state cache] Fix initialization offset (#11024) 5458aa8 Switch on NonUnitStatements warning in daml-lf/transaction (#11048) 9641fd5 auth middleware: no print secret (#11050) a885f52 [DPP-417] [DPP-595] Add error code version switching mechanism (#11035) 12e0c72 fix blackduck logic (#11049) eb87b34 kvutils: Add the logging context for ledger state operations. (#11030) b7daa5f Address remaining dependabot alerts (#11045) 5e43f8c es: drop jobs-* indices (#10857) 03203b7 Define encoding/decoding for module imports (#11036) 57a1597 Setting timeoutToleranceMillis to 10 minutes to prevent flakiness (#11043) df59f3f Fix Navigator dependabot alerts (#11044) 6bf45a3 Upgrade Navigator to Webpack 5 (#11040) 80e217e [DPP-622] Add conformance tests that verifies TLSv1.0 and TLSv1 are disabled. (#10983) 626e1fb Small lf value.cids regression fix (found by canton unit tests) (#11032) a4629a4 KV: Ignore daml_lf_1.proto when checking for KV protobuf compatibility (#11021) ee9be65 kvutils: Add metadata to `Err` [KVL-1032] (#10992) 0d3ae6e interface methods: Haskell Typechecker (#11028) e36eb46 Resolve `set-value` to 4.0.1 and above (#11029) 4075624 interface methods: Haskell AST (#11018) 7c1fd50 Bump grunt-browserify (#11026) 5f3f582 Upgrade webpack-dev-server in Navigator (#11025) 91be1e1 Drop matchdep dependency from docs build (#11023) fe9aeff Increase es disk size (#11019) eac7963 LF: Refactor ProtoTest.scala (#11020) e79a30a For the client binding propagate the full original completion [KVL-1112] (#10879) 59ad995 fix buf check (#11014) abc3e66 Increase the tolarance for handover of control in HaCoordinatorSpec (#10997) 19b2bf4 LF: Cosmetic clean-up in the Speedy Compiler (#11015) cb0e41f LF: Add interface support to the Preprocessor (#11013) c33297c Remove `transactionNormalization` flag. (#11010) f5d2135 Check protobuf compatibility of `main` and PR commits w.r.t. previous stable release [KVL-1109] (#10950) bf8b75d interface methods: Add protobuf definitions. (#11005) 88e1430 Make LargeTransactionTest use ValueEnricher, so it can work with normalized transactions coming out of the engine. (#11003) a043926 rotate release duty after 1.17.0-snapshot.20210921.7889.0.1b473c2b (#10972) 35666ca Add MINIMAL pragma for Additive type class (#11001) 8de162b [DPP-586] Upgrade to netty 4.1.67.Final and netty-tcnative-boringssl-static 2.0.40.Final (#10956) 871d03b release 1.18.0-snapshot.20210922.7908.0.ced4a272 (#10998) 721575e [JSON-API] Postgres perf job (#10986) f2d9f07 Release RC2 for SDK 1.17.0 (#10996) ``` Changelog: ``` - [JSON API] Under rare conditions, a multi-template query backed by database could have an ACS portion that doesn't match its transaction stream, if updated concurrently. This conditions is now checked and accounted for. See `issue #10617 <https://github.com/digital-asset/daml/pull/10617>`__. - The OAuth2 Middleware now obfuscates its Client Secret when logging its config. - [Integration Kit] We have added ``loggingContext`` as an implicit parameter to more _kvutils_ trait methods. Implementors may need to do the same in their trait implementations. This can make it easier to log with the appropriate context. java-client-bindings - the original full completion is included in the `CompletionResponse` when available Daml on SQL, Integration Kit, Sandbox: Drop support for TLS 1.0 and 1.1 in Ledger API. ``` CHANGELOG_BEGIN CHANGELOG_END Co-authored-by: Azure Pipelines DAML Build <support@digitalasset.com>
After the websocket DB query, we check the offset table to ensure that it's consistent with what we just wrote to it in the fetch-and-persist phase. If it isn't, we try again. (We can always use the offset that we just wrote in our final response body, because if it doesn't match, we're going to use the new offset and re-fetch anyway.)
We try 10 times, and if we still don't get consistent results, we fail the WS request. I haven't explored more complex backoff strategies here; the focus of this PR is getting a minimal update that is least likely to livelock (and thus exhaust all retries).
We could also use the
fetchAndPersistBracket
function for the non-WS queries. I'm not sure that's necessary, though.Fixes #10334. See that issue for a longer explanation of the possible race conditions this PR addresses.
minimumViableOffsets
approachPull Request Checklist
CHANGELOG_BEGIN
andCHANGELOG_END
tagsNOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with
/AzurePipelines run
totrigger the build.