-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the probability of balancing new regions #2524
Comments
The first proposal: We put new regions with TTL to the fresh set and pick them with more priority. Its disadvantage is that it is difficult for us to give the probability between the fresh set and the other one. |
The second proposal: Suppose that we expect all regions have the same probability of being scheduled in the past When placing a new region with a TTL, which will live x period, the probability of being scheduled Its disadvantage is that when the difference between Of course, we can choose non-linear reduction for TTL weight, such as |
Consider the following:
|
I think for a newly created region, we can increase the probability of being selected for that region through a kind of mechanism like TTL. But for the regions which have already existed in the system, we can still use the original way. There is no need to use the time-related way as a reference for all regions.
I am not very clear about what you mean here. Would you like to explain more?
Maybe one possible way is to reset the TTL? @lhy1024 BTW, would you like to join our slack channel? |
Looks good. Maybe this is easier to implement. Calculating probabilities based on time is really hard to control. Because there will be many uncertain factors, such as scheduling failure and the irregular generation of new regions.
This is the question we raised in our discussion. If the probability of a new region is excessively increased, the new region may be selected multiple times, leading to frequent scheduling. But if you don't use the "calculating probabilities" approach, there should be some mechanisms to avoid this.
Sure. |
The reason why this problem existed is that we want to avoid the hot spot problem. Just as @nolouch said, "Import a new table into a big cluster, the region of this table will be unevenly distributed, which will bring some hot store issues", once we have selected a new region to balance, maybe that region can be removed from the new region set. |
@rleungx What does it mean, can you explain it? |
I think it may be a good idea! |
If we have selected a region and create an operator for it, but the operator is timeout due to some reason. Do we need to reset/refresh the TTL? |
If we put a new region with TTL into a new region set and select them first. For the regions that already exist, the selection still follows the original method. When the new region is successfully dispatched, it is removed from the collection. For different generation rates of regions and the probability of preferential selection of new regions, the following situations may occur:
The disadvantage is that it's very difficult to give a probability between a new set and another set. Maybe we can experiment to get the right value, or maybe it's not a fixed value. |
We may need to consider adding the configuration, effect and cluster size of the strategy and some other information to the telemetry |
Feature Request
Describe your feature request related problem
There is an issue about the new region may have a low probability to be balanced,such as:
such as the above drawing, the y-axis represents the existing regions at that moment. region 8 only appears at time t8, the probability of R8 being selected in the entire sample space is very small.
this issue will lead to a scenario: import a new table into a big cluster, the region of this table will be unevenly distributed, which will bring some hot store issues.
Describe alternatives you've considered
Empirically, We hope that the distribution of data is relatively random.
Teachability, Documentation, Adoption, Migration Strategy
The text was updated successfully, but these errors were encountered: