Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

optimize memory in hundreds-of tables scenario #52

Merged
merged 16 commits into from
Aug 6, 2018

Conversation

holys
Copy link
Contributor

@holys holys commented Jul 31, 2018

It's mainly caused by tidb kvencoder and of course lightning use it incorrectly.

Reference: pingcap/tidb#7094

Test data:
512 tables,each table has 10000 rows. tested on my laptop.

After optimized, the heap memory costs at average 200-400 MB
image

About the modification:

  1. Remove the old kvencoder pool since we don't need it anymore.
  2. Since now there is a global domain and store in kvencoder, we create table schema for each table when table job begins, so here we need a special kvencoder to initialize the table schema. For each region(chunk), we will create a new kvencoder to the encoding(it won't cost too much resouce since we have a global and unique domain and store), and close the encoder when the region finish

wait #51 merged

@holys holys added the status/WIP Work in progress label Jul 31, 2018
@holys holys removed the status/WIP Work in progress label Jul 31, 2018
@holys
Copy link
Contributor Author

holys commented Jul 31, 2018

@GregoryIan @csuzhangxc PTAL

return nextRowID, affectedRows, checksum, errors.Trace(err)
}

////////////////////////////////////////////////////////////////

type kvEncoderPool struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no more use the kvencoder pool. useless

dbInfo.Name, tableInfo.Name, tableInfo.ID,
tableInfo.Columns, cfg.TiDB.SQLMode)
if err != nil {
common.AppLogger.Errorf("failed to new kv encoder (%s) : %s", dbInfo.Name, err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.ErrorStack(err) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed that line since we have trace the error.

@@ -484,118 +504,19 @@ func (t *regionRestoreTask) Run(ctx context.Context) error {
return errors.Trace(err)
}
t.status = statFinished

return nil
err = t.encoder.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we need to do t.encoder.Close() in defer?
or maybe you can make encoder's life cycle minimized


tr := &TableRestore{
ctx: ctx,
cfg: cfg,
dbInfo: dbInfo,
tableInfo: tableInfo,
tableMeta: tableMeta,
encoders: encoders,
encoder: encoder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems no need to save it? because we do not use it anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to save it because we need to close the encoder in the last when all the regions(of the table) finished.

Why we can't close the encoder early? The reason is that it may close the global dom.
ref: https://github.com/pingcap/tidb/blob/master/util/kvencoder/kv_encoder.go#L118

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳 fine

@@ -487,6 +486,11 @@ func newRegionRestoreTask(
}

func (t *regionRestoreTask) Run(ctx context.Context) error {
defer func() {
if err := t.encoder.Close(); err != nil {
common.AppLogger.Errorf("close encoder err %v", errors.ErrorStack(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to modify the returned error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@holys holys requested a review from WangXiangUSTC August 6, 2018 05:24
func (t *regionRestoreTask) Run(ctx context.Context) error {
func (t *regionRestoreTask) Run(ctx context.Context) (err error) {
defer func() {
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return it, and output error of encoder.Close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GregoryIan PTAL

@IANTHEREAL
Copy link
Collaborator

Rest LGTM

@IANTHEREAL
Copy link
Collaborator

I actually want to know if there are any other memory points?

@holys
Copy link
Contributor Author

holys commented Aug 6, 2018

@GregoryIan There is another memory point. Tikv-importer will allocate 1 GB memory for each open tables(1 GB is configurable). If there are many opened tables in the same time which will allocate so much memory and even causes tikv-importer OOM killed. For one hand, tidb-lightning should control the maximum opened engines(one engine is corresponding to an opened table in tikv-importer), on the other hand, tikv-importer should control it by itself but it doesn't.
@DorianZheng will add limitation to tikv-importer to control the maximum of open engines.

@csuzhangxc
Copy link
Member

LGTM

@holys holys removed the request for review from WangXiangUSTC August 6, 2018 05:47
@IANTHEREAL
Copy link
Collaborator

I known it. Has the program been specifically tested for memory usage?

@IANTHEREAL
Copy link
Collaborator

LGTM

@holys
Copy link
Contributor Author

holys commented Aug 6, 2018

@GregoryIan I only tested it with multiple tables(512 tables), and I am going to test with large table.
What's your suggestion on memory usage testing?

@holys holys merged commit eb377a1 into master Aug 6, 2018
@holys holys deleted the chendahui/optimize-memory branch August 6, 2018 06:48
@IANTHEREAL
Copy link
Collaborator

We'll talk more about it later

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants