Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: fix a bug of update with outer join #7177

Merged
merged 17 commits into from
Aug 2, 2018
Merged

executor: fix a bug of update with outer join #7177

merged 17 commits into from
Aug 2, 2018

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Jul 27, 2018

What have you changed? (mandatory)

fixes #7176

question

fix update table with outer join, mainly fix two question

  • a left join b, update a's columns, b should not change or error
  • a left join b, update b's columns, and b has rows matched, they will be updated
  • a left join b, update b's columns, and b no matched, b should not insert or any other action

more cases can be seen in new unit test

change

  • build the join physical plan will mark right table columns
  • during updated record will check the previous mark and check if all data is null is null filled by join
  • ignore update and check for these filled null columns

What is the type of the changes? (mandatory)

  • Bug fix (non-breaking change which fixes an issue)

How has this PR been tested? (mandatory)

  • unit tests
  • integration tests
  • manual tests?

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

no

Does this PR need to be added to the release notes? (mandatory)

Yes

fix a bug of update with outer join

Refer to a related PR or issue link (optional)

#7176

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

in testcase


This change is Reviewable

@lysu lysu added type/bugfix This PR fixes a bug. sig/execution SIG execution labels Jul 27, 2018
tk.MustQuery("select id, k, v from t3").Check(testkit.Rows())

// test left join and right no records but update no records part.
tk.MustExec("update t1 left join t2 on t1.k = t2.k set t1.v = t2.v, t2.v = 3") // exchange to t2.v = 3, t1.v = t2.v..will got a bug.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exchange t1.v = t2.v, t2.v = 3 to t2.v = 3, t1.v = t2.v will be fix later - -

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -140,6 +140,12 @@ func (col *CorrelatedColumn) ResolveIndices(_ *Schema) Expression {
func (col *CorrelatedColumn) resolveIndices(_ *Schema) {
}

// ColumnIdentifier represents a identifier of column.
type ColumnIdentifier struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wait for this merged?
#7157

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winoros should #7157 cherry-pick to 2.0.6~?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer might be no for now.

@@ -37,7 +37,8 @@ type Schema struct {
Columns []*Column
Keys []KeyInfo
// TblID2Handle stores the tables' handle column information if we need handle in execution phase.
TblID2Handle map[int64][]*Column
TblID2Handle map[int64][]*Column
ReplenishCols map[ColumnIdentifier]struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this set maintained by INSERT or UPDATE instead of in schema?

// Rebase auto increment id if the field is changed.
if mysql.HasAutoIncrementFlag(col.Flag) {
if mysql.HasAutoIncrementFlag(col.Flag) && !(checkReplenish && oldDataAllNull) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is acceptable, but the root cause is not this. It seems MySQL sets the auto-id and checks NULL for it just before writing the row. It is much later than TiDB. Maybe someday we should refactor TiDB's implement.

@XuHuaiyu
Copy link
Contributor

If this PR is ready to be reviewed,
DNM can be removed. ^_^

@lysu lysu removed the status/DNM label Jul 30, 2018
@lysu
Copy link
Contributor Author

lysu commented Jul 31, 2018

PTAL @XuHuaiyu @winoros thx~

@lysu lysu added the priority/release-blocker This issue blocks a release. Please solve it ASAP. label Jul 31, 2018
@@ -173,6 +179,15 @@ func (e *UpdateExec) composeNewRow(rowIdx int, oldRow []types.Datum) ([]types.Da
return newRowData, nil
}

func allNullDatums(datums []types.Datum, colRange ColumnIndexRange) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ allNullDatums/ isAllNull

switch p := selectPlan.(type) {
case *plan.PhysicalHashJoin:
joinType = p.JoinType
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need break in golang

tk.MustExec("insert into t1 values (1, 0)")
tk.MustExec("insert into t4 values (3, 3)")

// test auto_increment & none-null column in right table of update left join.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need @lilin90 's help here...
For all the comments in this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> test the case that the table with auto_increment or none-null columns as the right table of left join

newRowsData [][]types.Datum // The new values to be set.
fetched bool
cursor int
replenishTbl map[string]ColumnIndexRange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment for this attribute

start, end int
}

// resolveReplenishTbl resolve sub-join's replenish column info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a more detailed comment for this function.

@@ -163,6 +165,10 @@ func (e *UpdateExec) handleErr(colName model.CIStr, rowIdx int, err error) error
func (e *UpdateExec) composeNewRow(rowIdx int, oldRow []types.Datum) ([]types.Datum, error) {
newRowData := types.CopyRow(oldRow)
for _, assign := range e.OrderedList {
colRange, isReplenish := e.replenishTbl[assign.Col.DBName.O+"-"+assign.Col.TblName.O]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it an optimization?
Can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mum..no..it's a bugfix not optimize, remove this will cause https://github.com/pingcap/tidb/pull/7177/files#diff-6d9a596d0d155b4342a598ba359efc8bR2985 failure

@@ -53,14 +53,15 @@ const (
// 4. lastInsertID (uint64) : the lastInsertID should be set by the newData.
// 5. err (error) : error in the update.
func updateRecord(ctx sessionctx.Context, h int64, oldData, newData []types.Datum, modified []bool, t table.Table,
onDup bool) (bool, bool, int64, uint64, error) {
onDup, checkReplenish bool) (bool, bool, int64, uint64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the check out of updateRecord?
This method is too complex already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done~

newRowsData [][]types.Datum // The new values to be set.
fetched bool
cursor int
replenishTbl map[string]ColumnIndexRange
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use *expression.Column as the key of the map.

Copy link
Member

@lilin90 lilin90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update comments description.

}

// Len implements sort.Interface#Less.
// let ranges first sort by `start` increasing order, and sort by `end` decreasing order if `start` are equal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • first sort by -> first sorted by
  • and sort by -> and then sorted by


// Len implements sort.Interface#Less.
// let ranges first sort by `start` increasing order, and sort by `end` decreasing order if `start` are equal.
// so in foldDuplicate can fold duplicate range by this order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete the extra "in".

}

// foldDuplicate removes the duplicate ranges.
// c must be sorted use sort.Sort.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use -> using

return ranges
}

// findColRange find the range hit by given column index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find -> finds

}

// findColRange find the range hit by given column index.
// c must be sorted use sort.Sort.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use -> using

tk.MustQuery("select k, v from t1").Check(testkit.Rows("1 <nil>"))
tk.MustQuery("select k, v from t2").Check(testkit.Rows())

// test right join and left no records but update no records part.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> test right join and the case that the left table has no matching record but has updated the left table columns

tk.MustQuery("select k, v from t1").Check(testkit.Rows("1 0"))
tk.MustQuery("select k, v from t2").Check(testkit.Rows())

// test left + right join update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> test the case of right join and left join at the same time

tk.MustQuery("select k, v from t2").Check(testkit.Rows())
tk.MustQuery("select k, v from t4").Check(testkit.Rows("3 4"))

// test left join and update right data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> test normal left join and the case that the right table has matching rows

tk.MustExec("update t1 left join t2 on t1.k = t2.k set t2.v = 11")
tk.MustQuery("select k, v from t2").Check(testkit.Rows("1 11"))

// test join same table multiple times, update right record and not insert no exists record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> test the case of continuously joining the same table and updating the unmatching records

tk.MustQuery("select k, v from t1").Check(testkit.Rows("1 111"))
tk.MustQuery("select k, v from t2").Check(testkit.Rows("1 11"))

// test left join and left all null record's update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> test the left join case that the left table has records but all records are null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks very much, ^ ^ I will address them.

@@ -72,7 +72,7 @@ func (e *AnalyzeExec) Next(ctx context.Context, chk *chunk.Chunk) error {
result := <-resultCh
if result.Err != nil {
err = result.Err
if errors.Trace(err) == analyzeWorkerPanic {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CGO_ENABLED=0 revive -formatter friendly -config revive.toml $(go list ./...| grep -vE "vendor")
  ✘  error-naming  error var analyzeWorkerPanic should have name of the form errFoo  
  /home/robi/Code/go/src/github.com/pingcap/tidb/executor/analyze.go:119:5

✘ 1 problem (1 error, 0 warnings)

Errors:
  1  error-naming  

@lysu
Copy link
Contributor Author

lysu commented Aug 1, 2018

/run-all-tests

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

cols2Handle = append(cols2Handle, Columns2HandleEntry{offset, end, handleCol.Index})
}
}
sort.Sort(cols2Handle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to sort them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for later find at https://github.com/pingcap/tidb/pull/7177/files/d67a05e0fcf141855ede1651256e10a85f41cf0f#diff-badfcd30d7596a08cd207b8e6ae778e6R1284 we need sorted, and for-loop TblID2Handle doesn't make sure order o o?

winoros
winoros previously approved these changes Aug 1, 2018
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// Columns2HandleEntry represents an mapper from column index to handle index.
type Columns2HandleEntry struct {
start, end int
handleIdx int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

  • s/handleIndex/handleOrdinal/
  • s/Columns2HandleEntry/cols2Handle/
  • s/Columns2Handle/cols2HandleSlice/

We can use the word "ordinal" to replace "index" to avoid misunderstanding.

}
return updateExec
}

// Columns2HandleEntry represents an mapper from column index to handle index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

// Columns2HandleEntry maps a consecutive columns to a handle column.
type Columns2HandleEntry struct {
    start, end    int32 // Represent the ordinal range [start, end) of the consecutive columns.
    handleOrdinal int32 // Represents the ordinal of the handle column.
}

And it seems that start is always equal to handleIdx, do we need to store handleIdx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleOrdinal may equal to start or end o.o?

if table without the primary key, the handle will be end
if table with the primary key, the handle will be primary definition index, e.g. create table t5(v int, k int, primary key(k)), will be start=1, end=3, handle=2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I think to use getTableOffset to decide the begin offset of a table is wrong:

199 func getTableOffset(schema *expression.Schema, handleCol *expression.Column) int {
200     for i, col := range schema.Columns {
201         if col.DBName.L == handleCol.DBName.L && col.TblName.L == handleCol.TblName.L {
202             return i
203         }
204     }
205     panic("Couldn't get column information when do update/delete")
206 }

Copy link
Contributor Author

@lysu lysu Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if biggerOne == 0 {
return 0, false
}
if c[biggerOne-1].start <= colIndex && colIndex < c[biggerOne-1].end {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here c[biggerOne-1].start <= colIndex is guaranteed to be true, we only need to check colIndex < c[biggerOne-1].end


// findHandle finds the range hit by given column index.
// c must be sorted using sort.Sort.
func (c Columns2Handle) findHandle(colIndex int) (int, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/colIndex/colOrdinal/ or s/colIndex/ordinal/ ?

}

// findHandle finds the range hit by given column index.
// c must be sorted using sort.Sort.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simplify the comment to "findHandle finds the ordinal of the corresponding handle column."? Since the slice is sorted when created, there is no need to specify the sorted preconditions.

@@ -62,6 +63,11 @@ func (e *UpdateExec) exec(schema *expression.Schema) ([]types.Datum, error) {
for _, col := range cols {
offset := getTableOffset(schema, col)
end := offset + len(tbl.WritableCols())
handleDatum := row[col.Index]
// handleDatum is nil only when outer join fill no matched columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about wrapping a function to check whether we can update the current record:

// canNotUpdate checks the handle of a record to decide whether that record
// can not be updated. The handle is NULL only when it is the inner side of an
// outer join: the outer row can not match any inner rows, and in this scenario
// the inner handle field is filled with a NULL value.
//
// This fixes: https://github.com/pingcap/tidb/issues/7176.
func (e *UpdateExec) canNotUpdate(handle types.Datum) bool {
    return handle.IsNull()
}

// Len implements sort.Interface#Less.
// let ranges first sorted by `start` increasing order, and then sorted by `end` increasing order if `start` are equal.
func (c Columns2Handle) Less(i, j int) bool {
if c[i].start == c[j].start {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This situation should never happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure about this...so add this protection.... 23333

@lysu
Copy link
Contributor Author

lysu commented Aug 1, 2018

/run-all-tests

}

// Len implements sort.Interface#Less.
// let ranges first sorted by `start` increasing order, and then sorted by `end` increasing order if `start` are equal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comments need to be updated 😄

}
return updateExec
}

// cols2Handle represents an mapper from column index to handle index.
type cols2Handle struct {
start, end int // Represent the ordinal range [start, end) of the consecutive columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about int32?

Copy link
Contributor Author

@lysu lysu Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? you mean combine two fields into one int32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I know fixed...

zz-jason
zz-jason previously approved these changes Aug 2, 2018
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu lysu added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 2, 2018
@zz-jason
Copy link
Member

zz-jason commented Aug 2, 2018

@XuHuaiyu PTAL

c[i], c[j] = c[j], c[i]
}

// Len implements sort.Interface#Less.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ Len/ Less

return len(c)
}

// Len implements sort.Interface#Swap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Len/Swap


// findHandle finds the ordinal of the corresponding handle column.
func (c cols2HandleSlice) findHandle(ordinal int32) (int32, bool) {
if c == nil || len(c) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if c == nil,
c.findHandle will panic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receiver is slice(point), so it's ok


// buildColumns2Handle build columns to handle mapping.
func buildColumns2Handle(schema *expression.Schema, tblID2Table map[int64]table.Table) cols2HandleSlice {
if len(schema.TblID2Handle) < 2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We need a comment for this check.
  2. Any test case covers this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old case tested this...for update single table update t set x =1 where y = 1

...but I had added one in executor_test TestUpdateJoin~

if c == nil || len(c) == 0 {
return 0, false
}
biggerOne := sort.Search(len(c), func(i int) bool { return c[i].start > ordinal })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a comment for biggerOne.

if biggerOne == 0 {
return 0, false
}
if ordinal < c[biggerOne-1].end {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this always be true if we reach here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now ...it's true...previous maybe no...- - fixed~

return 0, false
}

// buildColumns2Handle build columns to handle mapping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ build/ builds

newRowsData [][]types.Datum // The new values to be set.
fetched bool
cursor int
columns2Handle cols2HandleSlice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a comment for this attribute.

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

}
return updateExec
}

// cols2Handle represents an mapper from column index to handle index.
type cols2Handle struct {
start, end int32 // Represent the ordinal range [start, end) of the consecutive columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the comment on the top line of the attribute, since it's a complete sentence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Aug 2, 2018

/run-all-tests

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason merged commit 67e724a into pingcap:master Aug 2, 2018
@lysu lysu deleted the dev/fix_outer_join_update branch September 27, 2018 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/release-blocker This issue blocks a release. Please solve it ASAP. sig/execution SIG execution status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update table left join with an auto_increment column table failure
7 participants