Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan: add cache for statistics table #2398

Merged
merged 10 commits into from
Jan 7, 2017
Merged

Conversation

alivxxx
Copy link
Contributor

@alivxxx alivxxx commented Jan 5, 2017

if txn == nil {
return statistics.PseudoTable(table)
}
m := meta.NewMeta(txn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think move cache refill logic to statistics package is better.
And If the the stats for a table is always nil, we don't want to get it from kv every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but it will introduce circle dependency, because meta/meta.go imports statistics package. And if stats table is nil, I will save a pseduo one in cache, thus we will not get it from kv every time.

Copy link
Member

@coocood coocood Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache can be implemented in another package.
like statscache

Copy link
Contributor Author

@alivxxx alivxxx Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plan/statscache or plan/statistics/statscache?

var expireDuration int64 = 60 * 60 * 1000

func tableCacheExpired(tbl *Table) bool {
duration := oracle.GetPhysical(time.Now()) - oracle.ExtractPhysical(uint64(tbl.TS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbl.TS is the stats build time, table stats may not updated frequently, we should use the last get time to determine if the stats is expired.

statistics.SetStatisticsTableCache(table.ID, tbl)
return statistics.PseudoTable(table)
}
tbl.TS = int64(txn.StartTS())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use the build time field to record lookup time.

if ok && tblInfo == si.tbl.Info && !tableCacheExpired(si) {
return si.tbl
}
txn := ctx.Txn()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During plan building, the ctx.Txn is null most of the time.
If it is nil, call ctx.ActivePendingTxn to get the transaction from transaction channel.

}

// SetStatisticsTableCache sets the statistics table cache.
func SetStatisticsTableCache(id int64, statsTbl *statistics.Table, ts uint64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always compare ts to time.Now, so why not just use time.Time, then this parameter can be removed.

@coocood
Copy link
Member

coocood commented Jan 6, 2017

LGTM

statsTblCache.m.RUnlock()
// Here we check the TableInfo because there may be some ddl changes in the duration period.
// Also, we rely on the fact that TableInfo will not be same if and only if there are ddl changes.
if ok && tblInfo == si.tbl.Info && !tableCacheExpired(si) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test when the table cache is expired

@hanfei1991
Copy link
Member

Add a test, compare the plan result before or after we use analysis.

tk.MustExec("analyze table t1")
result = tk.MustQuery("explain select * from t1 where t1.a = 1")
rowStr = fmt.Sprintf("%s", result.Rows())
c.Check(strings.Split(rowStr, "{")[0], Equals, "[[TableScan_4 ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you can add the cases like following:
index(a) is (1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2) and filter is a =1
index(b) is (1,2,3,4,5,6......15) and filter is b = 1.
Then optimizer will choose index b.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add these tests in plan test, init the store can execute sql stmt in test.

Copy link
Contributor Author

@alivxxx alivxxx Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good test case, but I don't think you will get the desired result, yet. Because currently, the logic for index columns is, somewhat, designed for pseudo cases, which in this case will always return table.Count / 3. After the two PR is merged, this test case will be testable. I will add it in the next PR. Seems next PR has lots of things to do...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I ignored it...

Copy link
Member

@hanfei1991 hanfei1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hanfei1991 hanfei1991 merged commit 4731649 into master Jan 7, 2017
@hanfei1991 hanfei1991 deleted the xhb/cache_statistics branch January 7, 2017 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants