-
Notifications
You must be signed in to change notification settings - Fork 32
Close LRU by database path for deleted database/index #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jiangphcn Thanks, that looks much more suitable for closing an individual shard's indexes prior to renaming the shard's directory. I'm a little concerned about the performance impact of walking the whole LRU, but in practice I've never seen it be more than 5000 items. I think this is fine for now. I remain more concerned about the race condition that exists in dreyfus: close-then-rename as two separate operations is a potential problem, and the current implementation in dreyfus does not serialise attempts to open and calls to close_lru/1. (Note that I don't propose attempting to close that race condition, but I think the implementation here ought to include some defensive logic to clean-up properly when closing, specifically in the code that logs this: The writer would need to be explicitly closed/released here when the rollback fails to avoid the writer lock being held indefinitely. (Naturally, that close/release code should not throw an additional exception under any circumstances: In general, it's a wise precaution to have clean-up code never fail and leak resources, even if it means suppressing an otherwise fault-indicating exception. Also, in the clouseau/dreyfus context, it's usually fine to throw away changes during a failure, since it's secondary data and ought to be rebuilt later.) What do you think? |
theburge
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Thanks again to @theburge As we discussed in apache/couchdb#2130 (review), let us try with current approach. |
Overview
The problem was originally exposed when updating search index with the following error:
Finally, it turns out that the soft-deletion path is missing a call to evict the shard from LRU (and therefore release the associated locks). This causes that lock file was left in working directory and search index failed to be updated.
To address this, the fix is to close the corresponding index in LRU before index is to be renamed/deleted.
Testing recommendations
@theburge made excellent troubleshooting and analysis, and also wrote below script to reproduce problem.
repro2.sh.txt
The problem doesn't occur after using the fix here and apache/couchdb#2130
Related Issues or Pull Requests
apache/couchdb#2130
Checklist
rel/overlay/etc/default.ini