Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Nov 19, 2020

Implement a LRU cache, and associated memoization factories.

The simple thread-safe memoization ends up 2x to 3x slower than a thread-unsafe memoization,
in part because the thread-safe memoization has to copy the return value.

Therefore, it can be more desirable to use the thread-unsafe memoization with a thread_local specifier.

Benchmarks (arg 1: key size, arg 2: value size):

LRUCacheLookup/8/16                        1877 ns         1877 ns       359452 items_per_second=53.2795M/s
LRUCacheLookup/8/1024                      1924 ns         1924 ns       367765 items_per_second=51.9831M/s
LRUCacheLookup/64/16                       2629 ns         2628 ns       264012 items_per_second=38.0487M/s
LRUCacheLookup/64/1024                     2671 ns         2671 ns       264135 items_per_second=37.4432M/s

MemoizeLRUCached/8/16                      4922 ns         4921 ns       141748 items_per_second=20.3205M/s
MemoizeLRUCached/8/1024                    5688 ns         5687 ns       119853 items_per_second=17.5833M/s
MemoizeLRUCached/64/16                     5347 ns         5347 ns       129134 items_per_second=18.7031M/s
MemoizeLRUCached/64/1024                   6318 ns         6318 ns       110366 items_per_second=15.829M/s

MemoizeLRUCachedThreadUnsafe/8/16          2156 ns         2156 ns       321327 items_per_second=46.3885M/s
MemoizeLRUCachedThreadUnsafe/8/1024        2165 ns         2165 ns       322995 items_per_second=46.1982M/s
MemoizeLRUCachedThreadUnsafe/64/16         3110 ns         3110 ns       225295 items_per_second=32.1552M/s
MemoizeLRUCachedThreadUnsafe/64/1024       3084 ns         3084 ns       225891 items_per_second=32.4268M/s

@github-actions
Copy link

@pitrou pitrou force-pushed the ARROW-10655-lru-cache branch 2 times, most recently from 6893477 to 312b406 Compare November 23, 2020 13:22
@pitrou
Copy link
Member Author

pitrou commented Nov 23, 2020

Rebased.

@pitrou pitrou force-pushed the ARROW-10655-lru-cache branch from 312b406 to a8fff45 Compare November 23, 2020 14:22
@nealrichardson
Copy link
Member

@pitrou @bkietz can we merge this?

@pitrou
Copy link
Member Author

pitrou commented Feb 9, 2021

@nealrichardson I'll probably remove the two-level replacement cache, which doesn't seem useful after all.

@pitrou pitrou force-pushed the ARROW-10655-lru-cache branch 2 times, most recently from 1a3a5c9 to f24b8d6 Compare February 9, 2021 17:27
@pitrou
Copy link
Member Author

pitrou commented Feb 9, 2021

I've updated this PR with a thread-unsafe memoizer and revamped benchmarks. I'll let @bkietz take a look.


static void MemoizeLRUCached(benchmark::State& state) {
const auto keys = MakeStrings(kCacheSize, state.range(0));
const auto values = MakeStrings(kCacheSize, state.range(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be interested to see benchmarks where the size of the string set is kCacheSize * [0.5, 0.9, 1.1, 2]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this benchmark I'm mostly interested in measuring the overhead of the memoize pattern rather the LRU cache itself. I don't think varying the size of the string set would vary the measured overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the string set is exactly as large as the cache then the only overhead you're measuring is promotion inside a static set. By contrast if the string set is larger than the cast you will measure the cost of replacement as well. The latter seems useful to know when deciding how large to make a cache, since it represents the penalty for guessing too low.

@pitrou pitrou force-pushed the ARROW-10655-lru-cache branch 2 times, most recently from c283ba7 to aedc40e Compare February 18, 2021 12:00
@pitrou
Copy link
Member Author

pitrou commented Feb 18, 2021

@bkietz Any other concerns?

@emkornfield
Copy link
Contributor

same naming style nit I've brought up before LruCacheLookup? Maybe we should just make this another exception to the style guide?

@bkietz
Copy link
Member

bkietz commented Feb 18, 2021

+1 for renaming LRUCache -> LruCache

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, no blocking concerns

Implement a LRU cache, and associated memoization factories.

The simple thread-safe memoization ends up 2x to 3x slower than a thread-unsafe memoization,
in part because the thread-safe memoization has to copy the return value.

Therefore, it can be more desirable to use the thread-unsafe memoization with a thread_local specifier.

Benchmarks (arg 1: key size, arg 2: value size):
```
LRUCacheLookup/8/16                        1877 ns         1877 ns       359452 items_per_second=53.2795M/s
LRUCacheLookup/8/1024                      1924 ns         1924 ns       367765 items_per_second=51.9831M/s
LRUCacheLookup/64/16                       2629 ns         2628 ns       264012 items_per_second=38.0487M/s
LRUCacheLookup/64/1024                     2671 ns         2671 ns       264135 items_per_second=37.4432M/s

MemoizeLRUCached/8/16                      4922 ns         4921 ns       141748 items_per_second=20.3205M/s
MemoizeLRUCached/8/1024                    5688 ns         5687 ns       119853 items_per_second=17.5833M/s
MemoizeLRUCached/64/16                     5347 ns         5347 ns       129134 items_per_second=18.7031M/s
MemoizeLRUCached/64/1024                   6318 ns         6318 ns       110366 items_per_second=15.829M/s

MemoizeLRUCachedThreadUnsafe/8/16          2156 ns         2156 ns       321327 items_per_second=46.3885M/s
MemoizeLRUCachedThreadUnsafe/8/1024        2165 ns         2165 ns       322995 items_per_second=46.1982M/s
MemoizeLRUCachedThreadUnsafe/64/16         3110 ns         3110 ns       225295 items_per_second=32.1552M/s
MemoizeLRUCachedThreadUnsafe/64/1024       3084 ns         3084 ns       225891 items_per_second=32.4268M/s
```
@pitrou pitrou force-pushed the ARROW-10655-lru-cache branch from aedc40e to 43ae445 Compare February 18, 2021 19:47
@pitrou
Copy link
Member Author

pitrou commented Feb 18, 2021

+1 for renaming LRUCache -> LruCache

Done.

@bkietz bkietz closed this in 5647e90 Feb 19, 2021
@pitrou pitrou deleted the ARROW-10655-lru-cache branch February 19, 2021 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants