-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28527: Adjust BlockCacheKey to use the file path instead of file name. #5832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t-cache (apache#5793) Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
…or Time Range Data Tiering (apache#5809) Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
…e name. The time-based priority eviction policy relies on the presence of path in the BlockCacheKey to fetch the required metadata to check data hotness and decide whether or not to retain the block in the bucket cache. Hence, the constructor of BlockCacheKey is adjusted to take the file path as the input parameter. The code paths that create the blockCacheKey and also the unit tests need to be adjusted to pass the path instead of file name. Change-Id: I2d9b194756797b5ec53aaf6cfb8135cc515d49a8
key = new BlockCacheKey(allFilePaths.get(protoKey.getHfilename()), protoKey.getOffset(), | ||
protoKey.getPrimaryReplicaBlock(), fromPb(protoKey.getBlockType())); | ||
} else { | ||
key = new BlockCacheKey(new Path(protoKey.getHfilename()), protoKey.getOffset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call will set the file name appropriately to protoKey.getHfilename().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path will be incorrect, right? Anyone who accesses it will find the incorrect path.
However, I couldn't think of any solution for that. @wchevreuil, do you have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way I can think of avoiding incorrect path to be set is by adding the following check in the constructor of BlockCacheKey:
if (hfilePath.getParent() != null) { this.filePath = hfilePath; }
But with this, path will remain null and during evictions (freespace()), either we should skip the data tiering logic for such blocks or take an expensive route of going over each file of the region server to determine the metadata. If the later case happens even for a single key, then, we are better off without the path and use the file names itself. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases will allFilesPath
be non null? If it's whenever data tiering isn't TIME_RANGE, I think it's ok to don't bother about file path. In those cases though, I think we should use the fileName version of the cache key constructor.
And when it isn't null, what do we do with keys that weren't found in allFilesPath
?
DataTieringManager.instantiate(onlineRegions); | ||
blockCache = BlockCacheFactory.createBlockCache(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instantiation of Data tiering manager is required to read the cache from persistence. Hence, change the order here.
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Can you explain why do you need that? This is not a simple/small change. |
Hi @wchevreuil, Our design originally relied on the presence of path in the BlockCacheKey so that we parse the path to get to the regionID and column family to reach to the file and access its metadata. (The framework that @vinayakphegde has implemented). However, what we found is that path may or may not be always populated by the callers who instantiate BlockCacheKey. An alternative to this is another approach that relies only on the file name which is always present in the BlockCacheKey. With this approach, we do not make any changes in the callers of BlockcacheKey or the unit tests. Please take a look and let me know your idea about the same. |
} | ||
|
||
public BlockCacheKey(Path hfilePath, long offset, boolean isPrimaryReplica, BlockType blockType) { | ||
this.filePath = hfilePath; | ||
this.isPrimaryReplicaBlock = isPrimaryReplica; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unneeded change.
@@ -682,7 +683,7 @@ public void fileCacheCompleted(Path filePath, long size) { | |||
} | |||
|
|||
private void updateRegionCachedSize(Path filePath, long cachedSize) { | |||
if (filePath != null) { | |||
if (filePath != null && filePath.getParent() != null && filePath.getParent().getParent() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really possible?
DataTieringManager dataTieringManager; | ||
try { | ||
dataTieringManager = DataTieringManager.getInstance(); | ||
allFilePaths = dataTieringManager.getAllFilesList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this for non TIME_RANGE tiering type, right? Can we avoid it, then?
// Data-Tiering manager has not been set up. | ||
// Ignore the error and proceed with the normal flow. | ||
LOG.warn("Error while getting DataTieringManager instance: {}", e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be thrown whenever data tiering type isn't set to TIME_RANGE?
this.offset = offset; | ||
this.blockType = blockType; | ||
public BlockCacheKey(Path hfilePath, long offset) { | ||
this(hfilePath, offset, true, BlockType.DATA); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just leave these and avoid having to touch every single existing test? As having a complete path is only relevant for the TIME_RANGE data tiering, I think it's fine to allow non complete paths when TIME_RANGE data tiering is not in use.
|
||
BlockCacheKey key = null; | ||
if (allFilePaths != null) { | ||
key = new BlockCacheKey(allFilePaths.get(protoKey.getHfilename()), protoKey.getOffset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allFilePaths.get(protoKey.getHfilename())
may yield a null value, if we are recovering from a crash where we didn't have a chance to save the persistent index after a given file may have been removed.
key = new BlockCacheKey(allFilePaths.get(protoKey.getHfilename()), protoKey.getOffset(), | ||
protoKey.getPrimaryReplicaBlock(), fromPb(protoKey.getBlockType())); | ||
} else { | ||
key = new BlockCacheKey(new Path(protoKey.getHfilename()), protoKey.getOffset(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases will allFilesPath
be non null? If it's whenever data tiering isn't TIME_RANGE, I think it's ok to don't bother about file path. In those cases though, I think we should use the fileName version of the cache key constructor.
And when it isn't null, what do we do with keys that weren't found in allFilesPath
?
Closing this PR in favour of PR #5829 |
The time-based priority eviction policy relies on the presence of path in
the BlockCacheKey to fetch the required metadata to check data hotness and
decide whether or not to retain the block in the bucket cache.
Hence, the constructor of BlockCacheKey is adjusted to take the file path
as the input parameter. The code paths that create the blockCacheKey and
also the unit tests need to be adjusted to pass the path instead of file name.