-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28467: Add time-based priority caching checks for cacheOnRead code paths. #5905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Optional<Boolean> cacheFileBlock = Optional.of(true); | ||
// Additionally perform the time-based priority checks to see | ||
// whether, or not to cache the block. | ||
if (cacheConf.getBlockCache().isPresent()) { | ||
cacheFileBlock = cacheConf.getBlockCache().get().shouldCacheFile(getHFileInfo(), conf); | ||
} | ||
final boolean shouldCacheFileBlock = cacheFileBlock.get(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this inside cacheConf.shouldCacheBlockOnRead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API shouldCacheBlockOnRead, is used at multiple places and takes only one argument (category). If I move the logic to this function, I will have to additionally pass these two arguments (getHFileInfo(), conf) to all usages. Hence, I will work on creating a new API within CacheConfig which internally checks for the data hotness and returns whether or not, to cache the file block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with have multiple shouldCache
methods, or even overloaded versions of shouldCacheBlockOnRead
, but can't we have one shouldCache
like method in CacheConfig that combines both checks? That way HFileReaderImpl code doesn't need to reference multiple checks and add different boolean variables each time it needs to check it.
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@@ -275,6 +275,18 @@ public boolean shouldCacheBlockOnRead(BlockCategory category) { | |||
|| (prefetchOnOpen && (category != BlockCategory.META && category != BlockCategory.UNKNOWN)); | |||
} | |||
|
|||
public boolean shouldCacheFileBlock(HFileInfo hFileInfo, Configuration conf) { | |||
Optional<Boolean> cacheFileBlock = Optional.of(true); | |||
// Additionally perform the time-based priority checks to see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we avoid mentioning that we are performing time-based priority checks here? At this point, we don't know that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
if (getBlockCache().isPresent()) { | ||
|
||
cacheFileBlock = getBlockCache().get().shouldCacheFile(hFileInfo, conf); | ||
LOG.info("BlockCache Present, cacheFileBlock: {}", cacheFileBlock.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we change the log level to debug? This information might only be needed during debugging, and since it is logged for each block, it could generate a lot of logs.
Additionally, is the log information sufficient? It only shows true or false but doesn't indicate which file or block this information corresponds to. It will be a bunch of lines with just 'BlockCache Present, cacheFileBlock: true/false,' not revealing any detailed information.
Also, I am not sure this log line is needed because shouldCacheFile(hFileInfo, conf)
already logs this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added to to debug something. Needs to be cleaned up. thanks for pointing this out.
|
||
@Test | ||
public void testCacheOnReadHotFile() throws Exception { | ||
// hStoreFiles[0] is a hot file. the blocks should not get loaded after a readBlock call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it should say "the blocks should get loaded after a readBlock call."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right! copy paste error from the above test.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
final boolean cacheOnRead = cacheConf.shouldCacheBlockOnRead(category); | ||
final boolean shouldCacheFileBlock = cacheConf.shouldCacheFileBlock(getHFileInfo(), conf); | ||
|
||
// Don't need the unpacked block back and we're storing the block in the cache compressed | ||
if (cacheOnly && cacheCompressed && cacheOnRead) { | ||
if (cacheOnly && cacheCompressed && cacheOnRead && shouldCacheFileBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in the same line of my previous comments: why do we need these two separate booleans (cacheOnRead && shouldCacheFileBlock) here? We should merge the data tiering logic inside shouldCacheBlockOnRead and deal with a single boolean variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. I have added an overloaded function to be used in this case and kept the original function to be used at other places. Please take a look.
Thanks,
Janardhan
1778877
to
1f88a38
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code wise, seems good to me, but the UT failures are apparently due to these changes and these tests need to be fixed.
Also, please rebase the PR with the feature branch, as it's showing conflicts after I had rebased HBASE-28463 branch with master.
…ode paths. Whenever the blocks of the file are read from the file system, they are cached into the block-cache if cache-on-read configuration is enabled. This configuration is enabled by default. However, with the time-based priority caching, we may not want to cache the blocks of the cold file. In this PR, we add checks to avoid the caching of cold data files and cache the blocks of hot data files. Change-Id: Ia07d791a841640e468c3ba33acebba81dbecbbc2
Change-Id: Ife4f3a50395877027df717191befc2962deb692d
Change-Id: Ia55e2e6ee415fa68d1c2714d127670f6df9c404d
Change-Id: I924b06e94c83502b39418097813da06d000dc336
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…ode paths. (apache#5905) Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org> Change-Id: Id2f7dff9059ac70495f162a454e4002ce51aa4be
Whenever the blocks of the file are read from the file system, they are cached into the block-cache if cache-on-read configuration is enabled. This configuration is enabled by default. However, with the time-based priority caching, we may not want to cache the blocks of the cold file.
In this PR, we add checks to avoid the caching of cold data files and cache the blocks of hot data files.
Change-Id: Ia07d791a841640e468c3ba33acebba81dbecbbc2