-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24530 Introduce a split policy similar with SteppingSplitPolicy… #1883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic makes sense to me, I'm just not sure if we should implement it as a new split policy, or if this more of a bug in IncreasingToUpperBoundRegionSplitPolicy.
* @return true if sum of store's size exceed the sizeToCheck | ||
*/ | ||
@Override | ||
protected boolean isExceedSize(int tableRegionsCount, long sizeToCheck) { | ||
long sumSize = 0; | ||
for (HStore store : region.getStores()) { | ||
sumSize += store.getSize(); | ||
} | ||
if (sumSize > sizeToCheck) { | ||
LOG.debug("ShouldSplit because region size is big enough " + | ||
" size=" + StringUtils.humanSize(sumSize) + | ||
", sizeToCheck=" + StringUtils.humanSize(sizeToCheck) + | ||
", regionsWithCommonTable=" + tableRegionsCount); | ||
return true; | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is how IncreasingToUpperBoundRegionSplitPolicy should actually behave. For scenarios where IncreasingToUpperBoundRegionSplitPolicy.getSizeToCheck() returns hbase.hregion.max.filesize , it doesn't make much sense to me to check on individual stores only, as regions with many stores may then grow much larger then hbase.hregion.max.filesize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems make sense when write to each store not evenly, but the parameter name exactly confused.
Maybe we should introduce a new parameter hbase.region.max.size for this new policy and rename the hbase.hregion.max.filesize to hbase.region.store.max.size for the exist policies in follow-on issue. WDYT?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought it agian, split by store size cause the actual region split size uncontrollable and easy to misunderstand, as hbase do more and more better on many columnfamilies, the disadvantages will more and more obvious, so i think we should change the default behavior to split by all reigon size.
@wchevreuil @saintstack WDYT?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchevreuil Maybe we'd better to introduce this new split policy as optional first, and make it as default in future if it works well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought it agian, split by store size cause the actual region split size uncontrollable and easy to misunderstand, as hbase do more and more better on many columnfamilies, the disadvantages will more and more obvious, so i think we should change the default behavior to split by all reigon size.
That's my concern. It may cause regions to grow much beyond the desired hbase.hregion.max.filesize.
Maybe we'd better to introduce this new split policy as optional first, and make it as default in future if it works well?
Yes, and maybe we should raise a discussion on dev list about making _ IncreasingToUpperBoundRegionSplitPolicy_ deprecated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and maybe we should raise a discussion on dev list about making _ IncreasingToUpperBoundRegionSplitPolicy_ deprecated?
Agree. Could you raise it? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will send it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchevreuil You are fine with the PR getting merged before raising the discussion over dev group? Or good to wait for the discussion?
@wchevreuil Could you help to merge it if you have time, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested minor change, else looks good
LOG.debug("ShouldSplit because region size is big enough " + | ||
" size=" + StringUtils.humanSize(sumSize) + | ||
", sizeToCheck=" + StringUtils.humanSize(sizeToCheck) + | ||
", regionsWithCommonTable=" + tableRegionsCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use placeholders with {}
for all arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Ping @wchevreuil |
I have raised a discussion here: https://lists.apache.org/thread.html/r08a8103e2532eb667a0fcb4efa8a4117b3f82e6251bc4bd0bc157c26%40%3Cdev.hbase.apache.org%3E Let's wait for folks comments before we decide next steps for this PR. |
Ok, thanks. |
@wchevreuil For now the default split policy is SteppingSplitPolicy, FYI. |
@wchevreuil @virajjasani Skimed the comments of the thread, seems the #3 is mostly supported, should we continue discussion or do as #3? |
BTW, ConstantSizeRegionSplitPolicy has the same problem, FYI. |
@bsglz , sorry for the delay. Let's give one more day for the DISCUSS thread. |
Ok, thanks. |
IMO if we change split policy to consider region's total size rather than HStore size, we should also correct the config name. We can follow deprecation cycle for the config. But with current hregion.max.filesize its more confusing. |
I don't think changing the config name is strictly required. Official docs are clear about the expected behaviour for hbase.hregion.max.filesize, and this policy current violates that:
This policy also breaks MAX_FILESIZE table descriptor expected behaviour, as per ableDescriptorBuilder javadoc:
|
@wchevreuil Seems there are different opinions, how about merge this first, and do more improvement in seperate jira when we reach consensus. |
@bsglz , sorry for the delay. So majority of people on the DISCUSS thread opined towards #3. I think @busbey suggestion to fix IncreasingToUpperBoundRegionSplitPolicy for minor updates only is reasonable, so maybe we can merge this current one to branch-2.2 and branch-2.3, then for master and branch-2 we fix IncreasingToUpperBoundRegionSplitPolicy. |
Ok, will do it in a follow-on issue. Thanks. |
@wchevreuil IMO, maybe we can do as below: What do you think? Thanks. |
… but count all store size
rebase |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Checked the log of TestHeapSize.testSizes, seems there are 56*references, but the FIXED_OVERHEAD in HRegion only calc 55? |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Yeah, we can't simply remove it between minor releases.
Agreed!
Just to confirm my understanding of your proposal: adding the new policy as default, deprecation of the overridden ones and fix of ConstantSizeRegionSplitPolicy should go on master and branch-2 branches, whilst branch-2.2 and branch-2.3 would only have the new policy added? If so, I'm +1 for that. |
That's true. |
@wchevreuil Filed a new issue for follow-on tasks, FYI. https://issues.apache.org/jira/browse/HBASE-24664 |
We had decided to go with approach from HBASE-24664. |
… but count all store size