Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infoschema: fill data length fields for tables #7657

Merged
merged 9 commits into from
Sep 13, 2018

Conversation

alivxxx
Copy link
Contributor

@alivxxx alivxxx commented Sep 10, 2018

What problem does this PR solve?

TiDB should be able to use sql to get the disk usage.
Fix #7468

What is changed and how it works?

Use the total column size that maintained by stats to fill in the data length fields, which includes avg_row_length, data_length and index_length.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

  • None

Related changes

  • None

PTAL @coocood @zz-jason @winoros

@alivxxx alivxxx added type/enhancement The issue or PR belongs to an enhancement. component/tools labels Sep 10, 2018
// varElemLen indicates this column is a variable length column.
const varElemLen = -1

func getFixedLen(colType *types.FieldType) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the length in Chunk, but the length in the storage is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we do not maintain the length for fixed length column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we should use a different function to estimate the average column size in the storage.

const VarElemLen = -1

// Length is the length of value for the type.
func (ft *FieldType) Length() int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's misleading because we already have a Flen field in FieldType.

@@ -720,6 +776,10 @@ func dataForTables(ctx sessionctx.Context, schemas []*model.DBInfo) ([][]types.D
if err != nil {
return nil, errors.Trace(err)
}
colLengthMap, err := getColLengthAllTables(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation doesn't do any RPC.
After this change, it may take too much time to execute if we have many tables.
And this may run frequently in some application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. You can take a look at getRowCountAllTable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.

We can cache the result in case it is called too frequently, just like the way GlobalVariableCache does.

var TableStatsCacheExpiry = 3 * time.Second

func (c *statsCache) get(ctx sessionctx.Context) (map[int64]uint64, map[tableHistID]uint64, error) {
c.mu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global lock may be a problem, GlobalVariableCache doesn't hold the lock while doing internal SQL query.

Copy link
Contributor Author

@alivxxx alivxxx Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better. If the cache is valid, then we do not wait long. If the cache is not valid, there must be one and only one doing the query, and during the query, no one would get the expired cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a problem that we do unnecessary request.
We know this issue and didn't have a chance to further optimize this when we implement GlobalVariableCache.

Since it doesn't matter the value is a little old, we can remove the global lock, set a flag indicates there is an ongoing request trying to update the cache, then other readers can just use the expired one.

@coocood
Copy link
Member

coocood commented Sep 12, 2018

LGTM

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 13, 2018
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@winoros winoros added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 13, 2018
@alivxxx
Copy link
Contributor Author

alivxxx commented Sep 13, 2018

/run-all-tests

@alivxxx
Copy link
Contributor Author

alivxxx commented Sep 13, 2018

/run-common-test tidb-test=pr/625
/run-integration-common-test tidb-test=pr/625

@alivxxx alivxxx merged commit 580e857 into pingcap:master Sep 13, 2018
@alivxxx alivxxx deleted the infoschema branch September 13, 2018 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tools status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants