Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: fix concurrent access to do.infoHandle has a data race error #8287

Merged
merged 14 commits into from
Nov 27, 2018

Conversation

ciscoxll
Copy link
Contributor

@ciscoxll ciscoxll commented Nov 12, 2018

What problem does this PR solve?

WARNING: DATA RACE
Write at 0x00c0001d4c70 by goroutine 51:
  github.com/pingcap/tidb/ddl_test.(*testFailDBSuite).TestHalfwayCancelOperations()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/domain/domain.go:538 +0x1c80
  github.com/pingcap/tidb/ddl_test.(*testFailDBSuite).TestHalfwayCancelOperations()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/ddl/fail_db_test.go:125 +0x1adb
  github.com/pingcap/tidb/ddl_test.(*testFailDBSuite).TestHalfwayCancelOperations()

 github.com/pingcap/tidb/ddl_test.(*testFailDBSuite).TestHalfwayCancelOperations()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/ddl/fail_db_test.go:69 +0x70
  runtime.call32()
      /usr/local/go/src/runtime/asm_amd64.s:522 +0x3a
  reflect.Value.Call()
      /usr/local/go/src/reflect/value.go:308 +0xc0
  github.com/pingcap/check.(*suiteRunner).forkTest.func1()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/pkg/mod/github.com/pingcap/check@v0.0.0-20171206051426-1c287c953996/check.go:798 +0xa04
  github.com/pingcap/check.(*suiteRunner).forkCall.func1()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/pkg/mod/github.com/pingcap/check@v0.0.0-20171206051426-1c287c953996/check.go:692 +0x89

Previous read at 0x00c0001d4c70 by goroutine 179:
  github.com/pingcap/tidb/domain.(*Domain).Reload()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/domain/domain.go:304 +0x151
  github.com/pingcap/tidb/domain.(*Domain).loadSchemaInLoop()
      /home/jenkins/workspace/tidb_ghpr_unit_test/go/src/github.com/pingcap/tidb/domain/domain.go:401 +0x7f1

What is changed and how it works?

  • Data race error when concurrently accessing do.infoHandle in this PR test.
  • So here is the atomic pointer operation on do.infoHandle.

Check List

Tests

  • Unit test

This change is Reviewable

@ciscoxll ciscoxll changed the title Fix concurrent access to do.infoHandle has a data race error ddl: fix concurrent access to do.infoHandle has a data race error Nov 12, 2018
@ciscoxll
Copy link
Contributor Author

@tiancaiamao @winkyao @zimulala PTAL

@ciscoxll
Copy link
Contributor Author

/run-all-tests

1 similar comment
@ciscoxll
Copy link
Contributor Author

/run-all-tests

domain/domain.go Outdated
@@ -535,7 +535,8 @@ func NewDomain(store kv.Storage, ddlLease time.Duration, statsLease time.Duratio

// ResetHandle resets the domain's infoschema handle. It is used for testing.
func (do *Domain) ResetHandle(store kv.Storage) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to remove this function, and call do.Reload() instead
@zimulala @ciscoxll

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call do.Reload still has data race.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to reset the handle. Reload can't do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why it's needed to reset the handle? @zimulala
do.Reload uses mutex.Lock @winkyao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used for testing. You can see it in "TestHalfwayCancelOperations". @tiancaiamao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimulala What do you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimulala @tiancaiamao Are we going to continue using s.dom.ResetHandle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winkyao @ciscoxll
I mean we can change test. Maybe we can remove ResetHandle.
At most, the test results are not so good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zimulala modified test failed to pass.

domain/domain.go Outdated Show resolved Hide resolved
@ciscoxll
Copy link
Contributor Author

/run-all-tests

domain/domain.go Outdated
// InfoSchema gets information schema from domain.
func (do *Domain) InfoSchema() infoschema.InfoSchema {
return do.infoHandle.Get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the read at do.infoHandle need to be protected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winkyao Done.

@ciscoxll
Copy link
Contributor Author

/run-all-tests

@winkyao
Copy link
Contributor

winkyao commented Nov 14, 2018

@XuHuaiyu Will fix the union test failed, @ciscoxll you can just use the command /run-integration-common-test, to rerun the common-test

@winkyao
Copy link
Contributor

winkyao commented Nov 14, 2018

/run-integration-common-test

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ciscoxll ciscoxll added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 14, 2018
@XuHuaiyu
Copy link
Contributor

/run-common-test tidb-test=pr/649
/run-integration-common-test tidb-test=pr/649

@winkyao
Copy link
Contributor

winkyao commented Nov 14, 2018

This PR need to be cherry-pick to 2.1 and 2.0

@tiancaiamao
Copy link
Contributor

@zimulala @crazycs520

@ciscoxll
Copy link
Contributor Author

@zimulala @tiancaiamao @crazycs520 PTAL

@ciscoxll
Copy link
Contributor Author

/run-all-tests

1 similar comment
@ciscoxll
Copy link
Contributor Author

/run-all-tests

ddl/fail_db_test.go Outdated Show resolved Hide resolved
@ciscoxll
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winkyao
Copy link
Contributor

winkyao commented Nov 27, 2018

/run-common-test tidb-test=pr/668

@winkyao
Copy link
Contributor

winkyao commented Nov 27, 2018

/run-integration-common-test tidb-test=pr/668

@zz-jason zz-jason added the priority/P1 The issue has P1 priority. label Nov 27, 2018
ddl/fail_db_test.go Outdated Show resolved Hide resolved
ddl/fail_db_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520 crazycs520 added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 27, 2018
@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao tiancaiamao added require-LGT3 Indicates that the PR requires three LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Nov 27, 2018
@ciscoxll ciscoxll merged commit e5dc251 into pingcap:master Nov 27, 2018
@ciscoxll ciscoxll deleted the fix-data-race branch November 27, 2018 14:14
@sre-bot sre-bot added the contribution This PR is from a community contributor. label Dec 18, 2019
@you06 you06 added the sig/sql-infra SIG: SQL Infra label Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. priority/P1 The issue has P1 priority. require-LGT3 Indicates that the PR requires three LGTM. sig/sql-infra SIG: SQL Infra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants