-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stale notification when eviction raced with an update #144
Conversation
41376d6
to
f2dd9f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good Ben, thanks for the quick fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up!
Hey @johnou, you might be interested in this slide deck. I didn't know your email so couldn't ping you about it, but seemed to be in your area of interest. It was for a private talk at a startup nearby a week ago. |
[SOLR-10141](https://issues.apache.org/jira/browse/SOLR-10141) (thanks Yonik!) As an optimization, an update is allowed to bypass the hash map and synchronize on the read entry directly. In this block it checks liveliness, performs the mutation, and notifies the writer. This avoids more expensive computations through the map. Previously, an eviction was performed in a computation to remove the entry and notify the writer, or resurrect. Inside the computation the entry was not synchronized on and that was done only after it was removed from the table. The removal listener was notified with the value initially read at the start of this method. This allowed an update to modify the value while (or after) the entry was being removed from the hash table. This led to notifying the writer and removal listener with the stale value. Because the writer must be called exclusively with the mutation, this computation must use a synchronized guard. Otherwise we might have preferred to re-read the value when notifying the listener. This adds a slight penalty on eviction (usually async) while allowing `put` to still be fast (but may block a little more often). `putSlow` was removed as it is no longer needed. It was a computation- based write that was safe from this issue. But it was only used when the new weight was zero, as that update race would cause an incorrect eviction. Now that the primary path is safe from this race, it is unnecessary.
@ben-manes you bet I am, always welcome to reach out to me at johno.crawford@gmail.com |
I fixed a similar bug in when clearing the cache using I've run all the tests locally and will release after the CI confirms. |
Released 2.4.0 |
@ben-manes did you ever hear back from Doug about the tickless hierarchical timing wheel in Java? |
When I brought it up he said it was on his backlog too, but I haven't talked with him about it since. I think jdk9 has a dedicated scheduler thread for the common fjp. Perhaps if its overloaded he'll add a ticking one. I would like to add that feature by a jdk9 release, since I'll bump up the version and fix API quirks. I don't think it's hard, just requires a little motivation. |
SOLR-10141 (thanks Yonik!)
As an optimization, an update is allowed to bypass the hash map and
synchronize on the read entry directly. In this block it checks
liveliness, perform the mutation, and notifies the writer. This avoids
more expensive computations through the map.
Previously, an eviction was performed in a computation to remove the
entry and notify the writer, or resurrect. Inside the computation the
entry was not synchronized on, and that was done only after it was
removed from the table. The removal listener was notified with the
value initially read at the start of this method.
This allowed an update to modify the value while (or after) the entry
was removing it from the hash table. This led to notifying the writer
and removal listener with the stale value. Because the writer must
be called exclusively with the mutation, this computation must use
a synchronized guard. Otherwise we might have preferred to re-read
the value when notifying the listener. This adds a slight penalty on
eviction (async) while allowing
put
to still be fast (but may block).putSlow
was removed as not longer needed. It was a computation-basedwrite that was safe from this issue. But it was only used when the new
weight was zero, as that update race would cause an incorrect eviction.
Now that the primary path is safe from this race, its unnecessary.
@yonik please review (thanks for finding this!)
@johnou (I think you mentioned that you may have observed this)