-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to Integrate SIEVE Eviction Algorithm #212
Comments
In my benchmark: https://github.com/kurtextrem/js-sieve/blob/main/bench/index.mts#L78, modifying the LRU cache implementation to use |
@kurtextrem Nice work! I am new to js, so I am not going to comment on this. If @Yomguithereal is willing to take a PR, would you like to submit one? |
Thank you! For sure, I'd be happy to submit a PR. |
Hello @yazhuo, I have of course read your paper and was meaning to add both SIEVE and S3-FIFO to the library but I lacked time to do so for the time being. Still, I have some open questions about how to better implement SIEVE, in JavaScript at least. The first part is related to the fact that you probably still need a doubly-linked list and cannot work with a simple ring buffer, as hinted in this lobste.rs conversation here. The second part is linked to the visited information per node. I am still not sure whether relying on a statically allocated bitset would be beneficial for performance and I need to benchmark it. What's more, using a uint8 byte array could simplify the code, enhance performance and provide an easy way to also implement something like a k-SIEVE as described here if required. But this would also cost 8 times the memory. Regarding your paper (at least this one: https://junchengyang.com/publication/nsdi24-SIEVE.pdf), this statement:
is incorrect. You also state that it would requires 12 lines of modified code to adapt
We will of course need both a
@kurtextrem I am willing to accept your PR but I would like first to take the time to re-read the paper to understand whether it is possible or not to use something simpler that the doubly-linked list here (namely a ring buffer or some variation of gap buffer). I am also wondering whether to introduce some kind of pre-processing/code templating to the library because the code duplication between the Map/object version of the cache and the support for deletion starts to be a little bit annoying for me to maintain. |
I didn't fully investigate, but from an early look, I do think the difference between "my" Sieve implementation and the modified Sieve implementation based on your code is, that your code does not instantiate nodes via class (e.g. avoids For future ref, this is the Sieve-on-mnemoist-map implementation: https://github.com/kurtextrem/js-sieve/blob/main/alt-impl/lru2.mts |
Hi @Yomguithereal, thanks for the reply!
You're right, SIEVE cannot be implemented by a simple ring buffer without shifting. Compared to CLOCK, SIEVE does lost this benefit. As the updates in Marc's blog mentioned, Toin Baker pointed out an interesting fix: "A simple fix to the SIEVE algorithm to accommodate circular arrays would be to move the current tail entry into the evicted entry’s slot (much like CLOCK copies a new entry into the evicted entry’s slot). " We need to further investigate how this would affect performance.
A great point on implementation details about visited information. Indeed, we didn't benchmark the memory consumption in the paper; however, it is important to consider it in real cases! Besides, SIEVE-k was worse than SIEVE on real web KV workloads, so I think relying on a statically allocated bitset might be a better option.
I've read your blog post - a great post and easy to follow! We will address this misstatement. @kurtextrem Thanks for your good work on SIEVE implementation! |
Hi @Yomguithereal, Thank you for your interest! The code is here, it was just a hack (not well engineered code) to see how easy it is to implement SIEVE. IMHO, I don't think bitset would be a better option because of an extra random memory access, so uint8 would be a better option. And it is easier to implement. If you want to go with a ring buffer, S3-FIFO might be a better option. :) |
I did not put much thought into it but my intuition tells me this looks like a O(n) operation, and I don't have the feeling it would be easily amortized?
Fair enough. But putting k-SIEVE aside, my main concern here is performance because lookups in the bitset will be just a little slower than indexing a uint8 byte array counterpart, but I will of course benchmark this to make sure. I am not sure dividing the memory cost of a single list (one of at least 4) is worth the performance hit.
I read this after I started writing my answer :). Seems we agree indeed. @yazhuo @1a1a11a related question, but I understand you are also the authors of S3-FIFO? What's your personal opinion regarding SIEVE vs S3-FIFO? I will probably implement both at some point anyway. |
Yes, the original SIEVE design and the approach in Marc's blog have a worst case O(n) complexity during eviction, the amortized time complexity is O(1). However, there is a simple fix: force an eviction after checking k objects (k can be up to 100 if implemented using a ring buffer, but lower if using a linked list). This will have some impact on the miss ratio, but it will also make the algorithm more robust. SIEVE vs S3-FIFOFrom our experience on over 6000 traces collected from 2007 to 2023 in 12 companies, S3-FIFO is more robust than all other algorithms, including SIEVE (but it is also more complex than SIEVE). The one type of workload SIEVE does not work well is small and old blockI/O workloads. But on web workloads and large multi-tenanted block I/O workloads, SIEVE works very well. We still need better understanding. |
Hi there,
Our team (@1a1a11a) has developed a new cache eviction algorithm, called SIEVE. It’s simple, efficient, and scalable.
Why SIEVE could be a great addition:
Welcome to dive into the details on our website sievecache.com and on our SIEVE blog.
We would love to explore the possibility of integrating SIEVE into mnemonist. We believe it could be a beneficial addition to the library and the community.
Looking forward to your feedback!
The text was updated successfully, but these errors were encountered: