Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: calculate GC safe point based on global min start timestamp #12223

Merged
merged 20 commits into from
Sep 30, 2019

Conversation

jackysp
Copy link
Member

@jackysp jackysp commented Sep 17, 2019

What problem does this PR solve?

We do not need to increase GC life time or GC run interval to avoid the transaction aborted when it took too much time.

What is changed and how it works?

  1. calculate the new safe point using the minimum of (now - GC life time) and the global min transaction start timestamp.
  2. remove the configurable max-txn-time-use and use a fixed 24 hours instead, which is the longest transaction life time.
  3. remove the relation of GC life time and the max-txn-time-use.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. start 2 TiDB's cluster
  2. connect to one TiDB, then begin; and get its start ts by information_schema.processlist
  3. checkout /tidb/store/gcworker/saved_safe_point, it will pin to the min start ts
  4. commit the transaction
  5. checkout /tidb/store/gcworker/saved_safe_point again, and the safe point will update again.

Code changes

  • Has exported function/method change

Related changes

  • Need to update the documentation

@jackysp jackysp added the type/enhancement The issue or PR belongs to an enhancement. label Sep 17, 2019
@jackysp
Copy link
Member Author

jackysp commented Sep 17, 2019

/run-all-tests

@jackysp jackysp added the sig/transaction SIG:Transaction label Sep 17, 2019
@ngaut
Copy link
Member

ngaut commented Sep 17, 2019

Great job.

globalMinStartTime := time.Unix(sec, nsec)

diff := safePoint.Sub(globalMinStartTime)
maxTxnTimeUse := time.Duration(tikv.MaxTxnTimeUse)*time.Millisecond + 10*time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think + 10secs is not needed.

}
physical := oracle.ExtractPhysical(globalMinStartTS)
sec, nsec := physical/1e3, (physical%1e3)*1e6
globalMinStartTime := time.Unix(sec, nsec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can construct time with ns only,
time.Unix(0, physical * 1e6)

return globalMinStartTime
}

return safePoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's is possible that the diff > maxTxnTimeUse but there is another transaction startTS before the safepoint.
So we need to check maxTxnTimeUse during kvs iteration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we needn't, since for these transactions which run longer than maxTxnTimeUse should be failed finally

@codecov
Copy link

codecov bot commented Sep 17, 2019

Codecov Report

Merging #12223 into master will increase coverage by 1.2613%.
The diff coverage is 63.2653%.

@@               Coverage Diff                @@
##             master     #12223        +/-   ##
================================================
+ Coverage   79.8138%   81.0751%   +1.2613%     
================================================
  Files           460        454         -6     
  Lines        102268      99150      -3118     
================================================
- Hits          81624      80386      -1238     
+ Misses        14746      12952      -1794     
+ Partials       5898       5812        -86

Copy link
Contributor

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the minStartTs is not updated or not exist, we needn't start GC

globalMinStartTime := time.Unix(0, oracle.ExtractPhysical(math.MaxUint64)*1e6)
maxTxnTimeUse := time.Duration(tikv.MaxTxnTimeUse)*time.Millisecond + 10*time.Second

for _, v := range kvs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the kvs is empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

return globalMinStartTime
}

return safePoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we needn't, since for these transactions which run longer than maxTxnTimeUse should be failed finally

@coocood
Copy link
Member

coocood commented Sep 17, 2019

LGTM

AndreMouche
AndreMouche previously approved these changes Sep 18, 2019
Copy link
Contributor

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndreMouche
Copy link
Contributor

@MyonKeminta PTAL

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MyonKeminta
Copy link
Contributor

Will it work for clusters without TiDB?

return safePoint
}

maxTxnTimeUse := time.Duration(tikv.MaxTxnTimeUse)*time.Millisecond + 10*time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why plus 10s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is your previous suggestion. Leave 10s for commit, otherwise, the transaction may not be committed yet, and GC has already started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get it. I think it's not necessary.

@jackysp
Copy link
Member Author

jackysp commented Sep 19, 2019

Will it work for clusters without TiDB?

No, I don't think the original way will not work either.

Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
@jackysp
Copy link
Member Author

jackysp commented Sep 24, 2019

Fixed a bug that could cause other long transactions of the same TiDB to fail. And added a test case. PTAL @coocood @AndreMouche @MyonKeminta

@coocood
Copy link
Member

coocood commented Sep 24, 2019

LGTM

Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

return safePoint
}

safePointTS := variable.GoTimeToTS(safePoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use oracle.ComposeTS? The same to other usages of variable.GoTimeToTS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to GetPhysical().

err = spkv.Put(fmt.Sprintf("%s/%s", domain.ServerMinStartTSPath, "b"), "1")
c.Assert(err, IsNil)
sp = s.gcWorker.calSafePointByMinStartTS(now)
c.Assert(sp, Equals, zeroTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may at least need to add a case that saves a normal time to etcd, to check if calSafePointByMinStartTs parses the time correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some integration tests.

for _, info := range pl {
if info.CurTxnStartTS < minStartTS {
if info.CurTxnStartTS > startTSLowerLimit && info.CurTxnStartTS < minStartTS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any test that covers these changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noop... Any suggestions? Maybe we need some intagration test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I'm not familiar with that part of code. Just thinking it will be nice if this is covered by tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test case added, PTAL @MyonKeminta .

Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coocood coocood added the status/can-merge Indicates a PR has been approved by a committer. label Sep 30, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 30, 2019

Your auto merge job has been accepted, waiting for 12397

@sre-bot
Copy link
Contributor

sre-bot commented Sep 30, 2019

/run-all-tests

@sre-bot sre-bot merged commit f146185 into pingcap:master Sep 30, 2019
@jackysp jackysp deleted the update_gc_now branch February 27, 2020 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/transaction SIG:Transaction status/can-merge Indicates a PR has been approved by a committer. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants