-
Couldn't load subscription status.
- Fork 3.9k
ARROW-10655: [C++] Add cache and memoization facility #8716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6893477 to
312b406
Compare
|
Rebased. |
312b406 to
a8fff45
Compare
|
@nealrichardson I'll probably remove the two-level replacement cache, which doesn't seem useful after all. |
1a3a5c9 to
f24b8d6
Compare
|
I've updated this PR with a thread-unsafe memoizer and revamped benchmarks. I'll let @bkietz take a look. |
d4608a9 to
356c300
Compare
|
|
||
| static void MemoizeLRUCached(benchmark::State& state) { | ||
| const auto keys = MakeStrings(kCacheSize, state.range(0)); | ||
| const auto values = MakeStrings(kCacheSize, state.range(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also be interested to see benchmarks where the size of the string set is kCacheSize * [0.5, 0.9, 1.1, 2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this benchmark I'm mostly interested in measuring the overhead of the memoize pattern rather the LRU cache itself. I don't think varying the size of the string set would vary the measured overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the string set is exactly as large as the cache then the only overhead you're measuring is promotion inside a static set. By contrast if the string set is larger than the cast you will measure the cost of replacement as well. The latter seems useful to know when deciding how large to make a cache, since it represents the penalty for guessing too low.
c283ba7 to
aedc40e
Compare
|
@bkietz Any other concerns? |
|
same naming style nit I've brought up before LruCacheLookup? Maybe we should just make this another exception to the style guide? |
|
+1 for renaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, no blocking concerns
Implement a LRU cache, and associated memoization factories. The simple thread-safe memoization ends up 2x to 3x slower than a thread-unsafe memoization, in part because the thread-safe memoization has to copy the return value. Therefore, it can be more desirable to use the thread-unsafe memoization with a thread_local specifier. Benchmarks (arg 1: key size, arg 2: value size): ``` LRUCacheLookup/8/16 1877 ns 1877 ns 359452 items_per_second=53.2795M/s LRUCacheLookup/8/1024 1924 ns 1924 ns 367765 items_per_second=51.9831M/s LRUCacheLookup/64/16 2629 ns 2628 ns 264012 items_per_second=38.0487M/s LRUCacheLookup/64/1024 2671 ns 2671 ns 264135 items_per_second=37.4432M/s MemoizeLRUCached/8/16 4922 ns 4921 ns 141748 items_per_second=20.3205M/s MemoizeLRUCached/8/1024 5688 ns 5687 ns 119853 items_per_second=17.5833M/s MemoizeLRUCached/64/16 5347 ns 5347 ns 129134 items_per_second=18.7031M/s MemoizeLRUCached/64/1024 6318 ns 6318 ns 110366 items_per_second=15.829M/s MemoizeLRUCachedThreadUnsafe/8/16 2156 ns 2156 ns 321327 items_per_second=46.3885M/s MemoizeLRUCachedThreadUnsafe/8/1024 2165 ns 2165 ns 322995 items_per_second=46.1982M/s MemoizeLRUCachedThreadUnsafe/64/16 3110 ns 3110 ns 225295 items_per_second=32.1552M/s MemoizeLRUCachedThreadUnsafe/64/1024 3084 ns 3084 ns 225891 items_per_second=32.4268M/s ```
aedc40e to
43ae445
Compare
Done. |
Implement a LRU cache, and associated memoization factories.
The simple thread-safe memoization ends up 2x to 3x slower than a thread-unsafe memoization,
in part because the thread-safe memoization has to copy the return value.
Therefore, it can be more desirable to use the thread-unsafe memoization with a thread_local specifier.
Benchmarks (arg 1: key size, arg 2: value size):