-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: import-cancellation failed #90434
Comments
Looks like the GC threshold is too aggressive for these large tpch queries. We can bump the threshold after the import succeeds.
|
roachtest.import-cancellation failed with artifacts on release-22.2 @ fc582bd2d43c1dc141077f2fae6da0ca9c23e449:
Parameters: |
Failure mode looks the same: | Error: pq: batch timestamp 1666439623.970102632,0 must be after replica GC threshold 1666439688.063994730,0 Should be fixed with #90487. |
roachtest.import-cancellation failed with artifacts on release-22.2 @ c0e9c25661d53e893210a7bfe79257f1c9468f52:
Parameters: Same failure on other branches
|
roachtest.import-cancellation failed with artifacts on release-22.2 @ 794fcfa793016e98a6ebadaaa29444e4661592ba:
Parameters: Same failure on other branches
|
For the two previous failures, both appear to be test timeouts: Run n: ~1m spin-up + 10:26-11:47 (1h21m) import + 11:47-14:25 (2h38m). |
I tried running the roachtest with range tombstones disabled. The import took 25m. The workload just started. It looks like we're seeing a 3-4x slowdown of the IMPORT phase when using range tombstones. Some of this is expected because we're no longer IMPORTing into a clean empty span: we're importing over a span full of garbage. This increases the cost of CheckSSTConflicts, but it also greatly increases the work the storage engine must do. All the ingested files have significant overlap, so they must be ingested into higher levels. This necessitates more compaction work to reshape the LSM, and is why we see elevated read amp. Unfortunately, I don't think we can do anything about this for 22.2. We should consider what we can do here in 23.1, but it is just a legitimately hard problem. |
Followup on this run #90434 (comment): The test failed with a timeout even with range tombstones disabled. Like @nicktrav noticed, it looks like the tpch workload has its own variance (maybe due to whether or not it has stats computed in time?). |
@nicktrav that CPU profile demonstrates the pathological behavior of range key seeks as a result of seek semantics. I think cockroachdb/pebble#1829 will make a large difference there. I'm hoping to tackle that first as a part of the range key cleanup work this cycle. |
roachtest.import-cancellation failed with artifacts on release-22.2 @ 6df4e8666ccb902b333195bdded59d69307b6050:
Parameters: Same failure on other branches
|
roachtest.import-cancellation failed with artifacts on release-22.2 @ 890b681dd5bb2c5ebabb06d89aed131af0b50fe8:
Parameters: Same failure on other branches
|
roachtest.import-cancellation failed with artifacts on release-22.2 @ 4816df3a9d76d179ed135a2b1efb53babb5611a0:
Parameters: |
This should be fixed via #90741. Closing. |
roachtest.import-cancellation failed with artifacts on release-22.2 @ 8579361117ff18a4255855a43114ddeb3e3b80c9:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=32
,ROACHTEST_encrypted=false
,ROACHTEST_fs=ext4
,ROACHTEST_localSSD=true
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-20767
Epic CRDB-16237
The text was updated successfully, but these errors were encountered: