-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store-tikv: drop invalid cached region #4506
Conversation
@atmzhou |
@atmzhou |
@coocood |
@disksing @AndreMouche @tiancaiamao |
@atmzhou |
@coocood |
@atmzhou |
@coocood |
@atmzhou |
store/tikv/region_cache.go
Outdated
sync.RWMutex | ||
regions map[RegionVerID]*Region | ||
sorted *llrb.LLRB | ||
ttls map[RegionVerID]time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about move the ttl
field to Region
, so no extra map need to be maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Region should not have a field ttl.
If we want to save a map structure, we can merge the two maps as
regions map[RegionVerID]<*Region, time.Time>.
However, this may incur updating a lot of codes.
So, I prefer keeping the two maps separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atmzhou
Then how about accessTime
, Region
can have an accessTime field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coocood @tiancaiamao
I have make a structure named CachedRegion as follows:
type CachedRegion struct {
region Region
ttl int64
}
Accordingly, each RegionCache will have a map map[RegionVerID]*CachedRegion.
In this way, we save a map structure.
I think this should conform to your thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atmzhou
ttl
is changing over time, we can't store a ttl
value in field.
d01a2d0
to
f7305aa
Compare
store/tikv/region_cache.go
Outdated
// CachedRegion encapsulates {Region, TTL} | ||
type CachedRegion struct { | ||
region *Region | ||
ttl int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ttl
is a duration, not a timestamp, why not name it lastAccess
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will change it. Thanks~
store/tikv/region_cache.go
Outdated
|
||
// CachedRegion encapsulates {Region, TTL} | ||
type CachedRegion struct { | ||
region *Region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not rename Region
to CachedRegion
directly and add the new field there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regions may be used to partition requests. For this kind of tasks, ttl is of no use.
In fact, ttl is only used for cache.
So, I prefer defining a new structure CachedRegion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Region
is a cached region with or without access time.
Use different type just to add the access time makes confusion.
store/tikv/region_cache.go
Outdated
// For example, T1 find ttl valid but before T1 freshes the ttl, T2 find ttl invalid | ||
// In this case, T2 will invoke this function but T1 may have freshed the ttl | ||
// Then, T2 should not drop the cached region | ||
ttl := atomic.LoadInt64(&cachedregion.ttl) // we may not need an atomic load, I think a simple load is enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to check the access time again, just delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, both rechecking and directly deleting are correct.
Directly deleting may simplify the code.
However, I think it leads to strange cases. Suppose that there are two visits on the same cached region. If the first visit determines to refresh the ttl of the region, when it conducts this action, it may find that the region was removed because of out-of-date. This is strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first reader found the region is valid, then he will just refersh the region, he won't lookup the map for the region again, so he never find that region was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,You are right.
Either deleting or rechecking is correct.
The choice depends on whether we want the execution is equivalent to the actual happening order of the two operations.
By rechecking, I choose that the execution should be equivalent to the actual happening order of the two operations.
This may be more strict than directly deleting.
I think the two possible choices are OK.
store/tikv/region_cache.go
Outdated
func (c *RegionCache) GetCachedRegion(id RegionVerID) *Region { | ||
for { | ||
c.mu.RLock() | ||
cachedregion, ok := c.mu.regions[id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just RUnlock
here.
The lock only protects the regions
map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will change the code. Thanks~
store/tikv/region_cache.go
Outdated
lastAccessTime := time.Unix(lastAccess, 0) | ||
if time.Since(lastAccessTime) < rcDefaultRegionCacheTTL { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we can extract L94-L98 into a method of CachedRegion
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will revise this.
5ed707b
to
7f1a6bf
Compare
store/tikv/region_cache.go
Outdated
@@ -26,12 +28,23 @@ import ( | |||
goctx "golang.org/x/net/context" | |||
) | |||
|
|||
const ( | |||
rcDefaultRegionCacheTTL = time.Minute * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a longer duration is more suitable. 10 minute for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Thanks~
store/tikv/region_cache.go
Outdated
if time.Since(lastAccessTime) < rcDefaultRegionCacheTTL { | ||
return true | ||
} | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return time.Since(lastAccessTime) < rcDefaultRegionCacheTTL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Thanks~
store/tikv/region_cache.go
Outdated
@@ -66,17 +79,61 @@ func (c *RPCContext) GetStoreID() uint64 { | |||
return 0 | |||
} | |||
|
|||
func (c *CachedRegion) isValid() bool { | |||
lastAccess := atomic.LoadInt64(&c.lastAccess) // we may not need an atomic load, I think a simple load is enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use atomic, or the race detector will complain. Please remove the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Thanks~
By the way, in fact, we do not need to use atomic.
|
||
// CachedRegion encapsulates {Region, TTL} | ||
type CachedRegion struct { | ||
region *Region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about region Region
?
// CachedRegion encapsulates {Region, TTL} | ||
type CachedRegion struct { | ||
region *Region | ||
lastAccess int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastAccess time.Time
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use int64 to enable atomic operation
store/tikv/region_cache.go
Outdated
// For example, T1 find ttl valid but before T1 freshes the ttl, T2 find ttl invalid | ||
// In this case, T2 will invoke this function but T1 may have freshed the ttl | ||
// Then, T2 should not drop the cached region | ||
if cachedregion.isValid() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's under lock, why there are concurrent threads may fresh it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because concurrent threads use atomic operations to fresh it.
In this way, we save write lock.
LGTM. |
@@ -138,26 +168,24 @@ func (c *RegionCache) LocateKey(bo *Backoffer, key []byte) (*KeyLocation, error) | |||
|
|||
// LocateRegionByID searches for the region with ID | |||
func (c *RegionCache) LocateRegionByID(bo *Backoffer, regionID uint64) (*KeyLocation, error) { | |||
c.mu.RLock() | |||
if r := c.getRegionByIDFromCache(regionID); r != nil { | |||
r := c.getRegionByIDFromCache(regionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@disksing Should we use GetCachedRegion
here?
This method is only used by HTTP status server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have different parameters. BTW, I think we need lock here. @atmzhou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@disksing I remember that you said this function is only used for test. So I did not add lock in getRegionByIDFromCache.
Do you think we should add lock? If yes, I will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We don't want debug API have data race too, especially it may affect online service.
cf8a5b8
to
cf482f0
Compare
PTAL @coocood |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some cached region may be out-of-date for they are long not used.
To avoid use these regions, we allocate each cached region a TTL, which records when the region may be out-of-date.
When an out-of-dated region is identified, we remove the region out of the region cache.
Fix #2511
Fix #4498