-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more tests to ASSERT_STATUS_CHECKED (3), API change #7715
Conversation
a52a74b
to
ae1bcc6
Compare
30a0c2b
to
ade8011
Compare
a49a313
to
ecfa9e6
Compare
@pdillinger This one is ready. Can you take a look please? |
ecfa9e6
to
4f525cf
Compare
@mrambacher @pdillinger If I run
After a bit of digging into this, I think there may be a race-condition in the test itself... I will see if I can figure it out... |
4f525cf
to
8f8fa8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the big set of fixes!
Suspected bug in DBTestBase::CountFiles. And please don't bundle API changes with big test refactoring/polishing. If this PR substantially depends on the API changes, please put them in another PR first, with HISTORY update.
Good catch. Seems very plausible that the flush callback sets a bg_error that is never checked. I don't know enough about background errors to say what the "right thing" is here. |
If your digging shows a race condition, I would suggest doing the following:
|
@adamretter you mentioned that there is a race condition. Can you give us a pointer? |
dff5bca
to
7c01cd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@siying I added a bunch of debugging print statements that are happening under a mutex. When the test passes, I get a different set of print output compared to when it fails. I can still dig in further, but for your reference I attach the code I use for debugging - debug.patch.txt |
We want to merge a giant PR like this ASAP but not if it has a flaky test (race condition). Can't we cover over and defer that fix as Mark suggested? (I didn't see any evidence it had been covered over and noted.) |
@siying @pdillinger @mrambacher After further investigation I am no longer convinced this is a problem in the test. In all instances I find that the unchecked
So, I am wondering who is responsible for clearing the status of For reference, the ASSERT_STATUS_CHECKED fails (mostly) here:
I am not quite sure of the best way to proceed... I am looking forward to some advice ;-) |
So I think I might understand the problem. I think there is a background job running that attempts to clear the recovery error. Sometimes that job completes earlier than others. There are two places in I ran the test with a Someone else @pdillinger @siying can correct me if my analysis is wrong... |
@mrambacher your suggested fix appears to work for me here on my Mac. |
I'm not allowed to push to this branch. Please pull in https://github.com/pdillinger/rocksdb/tree/pr7715 so we can get this landed ASAP. |
Sorry I missed this update (Chrome likes showing me old versions of web pages even when they appear to reload!) But I don't think we should block this PR (perhaps #7831) on that answer. (I'm not nearly the expert in this area.) |
@pdillinger I think it is important that we get this correct. Perhaps someone else in the team who knows this area could take a look? |
I think we can add |
+1, though we are ending up with a good number of PermitUncheckedError(), which is why I think it's important to document the context/justification for the call when it's not obvious, especially in cases where the reverse-engineering approach of "remove it and see what breaks" might not show any results (e.g. because of race condition). |
Got it. Line 564 in c1a65a4
edit: @anand1976 is the expert on this area, so maybe he can confirm there's no risk of lost recovery errors some time, but I wouldn't block this PR in the meantime |
I crossed this out because L577 also includes Line 395 in c1a65a4
|
…onsistencyFailTest2
8858e56
to
111fcca
Compare
@ajkr @mrambacher Thanks for the suggested fixes. I have now incorporated the fix suggested by @ajkr and all tests appear to be passing. @pdillinger I think this is now ready for you to land. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Sorry for jumping in late. If a background recovery is attempted, it should always check |
@pdillinger merged this pull request in 6e0f62f. |
Summary: Third batch of adding more tests to ASSERT_STATUS_CHECKED. * db_compaction_filter_test * db_compaction_test * db_dynamic_level_test * db_inplace_update_test * db_sst_test * db_tailing_iter_test * db_io_failure_test Also update GetApproximateSizes APIs to all return Status. Pull Request resolved: facebook#7715 Reviewed By: jay-zhuang Differential Revision: D25806896 Pulled By: pdillinger fbshipit-source-id: 6cb9d62ba5a756c645812754c596ad3995d7c262
Summary: Third batch of adding more tests to ASSERT_STATUS_CHECKED. * db_compaction_filter_test * db_compaction_test * db_dynamic_level_test * db_inplace_update_test * db_sst_test * db_tailing_iter_test * db_io_failure_test Also update GetApproximateSizes APIs to all return Status. Pull Request resolved: facebook/rocksdb#7715 Reviewed By: jay-zhuang Differential Revision: D25806896 Pulled By: pdillinger fbshipit-source-id: 6cb9d62ba5a756c645812754c596ad3995d7c262 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Summary: Third batch of adding more tests to ASSERT_STATUS_CHECKED. * db_compaction_filter_test * db_compaction_test * db_dynamic_level_test * db_inplace_update_test * db_sst_test * db_tailing_iter_test * db_io_failure_test Also update GetApproximateSizes APIs to all return Status. Pull Request resolved: facebook/rocksdb#7715 Reviewed By: jay-zhuang Differential Revision: D25806896 Pulled By: pdillinger fbshipit-source-id: 6cb9d62ba5a756c645812754c596ad3995d7c262 Signed-off-by: Changlong Chen <levisonchen@live.cn>
Third batch of adding more tests to ASSERT_STATUS_CHECKED.
Also update GetApproximateSizes APIs to all return Status.