Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPCH q20 fails on master branch #5338

Closed
birdstorm opened this issue Dec 7, 2017 · 12 comments · Fixed by #5383
Closed

TPCH q20 fails on master branch #5338

birdstorm opened this issue Dec 7, 2017 · 12 comments · Fixed by #5383
Assignees
Labels
priority/P1 The issue has P1 priority. type/bug The issue is confirmed as a bug.

Comments

@birdstorm
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.

run TPCH q20:

mysql> select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01'  ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    72
Current database: tpch_test

ERROR 2013 (HY000): Lost connection to MySQL server during query

Following is TiDB's log:

2017/12/07 17:23:22.722 metrics.go:363: [warning] [EXPENSIVE_QUERY] select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01'  ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name
2017/12/07 17:23:22.844 conn.go:391: [error] lastCmd select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= '1994-01-01' and l_shipdate < '1995-01-01'  ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name, runtime error: index out of range, goroutine 47018 [running]:
github.com/pingcap/tidb/server.(*clientConn).Run.func1(0xc4224ca000)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/server/conn.go:389 +0xfd
panic(0x1c4ba00, 0x2746310)
	/usr/local/Cellar/go/1.9.1/libexec/src/runtime/panic.go:491 +0x283
github.com/pingcap/tidb/types.DatumRow.GetDatum(...)
	<autogenerated>:1
github.com/pingcap/tidb/types.(*DatumRow).GetDatum(0xc420bc6bc0, 0x5, 0xc4208ccaf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	<autogenerated>:1 +0xed
github.com/pingcap/tidb/expression.(*Column).Eval(0xc4214212b0, 0x2557260, 0xc420bc6bc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/expression/column.go:203 +0x97
github.com/pingcap/tidb/expression/aggregation.(*aggFunction).updateSum(0xc422cae600, 0xc4206b1680, 0xc4234f6060, 0x2557260, 0xc420bc6bc0, 0xc420bc6bc0, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/expression/aggregation/aggregation.go:258 +0xa4
github.com/pingcap/tidb/expression/aggregation.(*sumFunction).Update(0xc422cae600, 0xc4206b1680, 0xc4234f6060, 0x2557260, 0xc420bc6bc0, 0x0, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/expression/aggregation/sum.go:42 +0x55
github.com/pingcap/tidb/executor.(*HashAggExec).innerNext(0xc421cad6b0, 0x2b47160, 0xc4209b07b0, 0x102a5a8, 0x1af40d8, 0xc420591368)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/aggregate.go:131 +0x2ae
github.com/pingcap/tidb/executor.(*HashAggExec).Next(0xc421cad6b0, 0x2b47160, 0xc4209b07b0, 0x252e500, 0xc4205914c0, 0x1010877, 0xc422d5e8d0, 0x30)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/aggregate.go:66 +0x3a3
github.com/pingcap/tidb/executor.(*SelectionExec).Next(0xc420c3ce60, 0x2b47160, 0xc4209b07b0, 0x8, 0x18, 0xc4207d3aa0, 0x2b99888, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/executor.go:584 +0x73
github.com/pingcap/tidb/executor.(*ProjectionExec).Next(0xc423df3d00, 0x2b47160, 0xc4209b07b0, 0x2556c50, 0xc42079c000, 0xc420d830a8, 0xc420591650, 0x3)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/executor.go:496 +0x76
github.com/pingcap/tidb/executor.(*HashJoinExec).prepare(0xc4209fd040, 0x2b47160, 0xc4209b07b0, 0xc421fd5df0, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/join.go:219 +0xb1
github.com/pingcap/tidb/executor.(*HashJoinExec).Next(0xc4209fd040, 0x2b47160, 0xc4209b07b0, 0x404a69eef8d9f99e, 0xc4205918b0, 0x19ebc31, 0x253a740, 0xc420c3c1e0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/join.go:414 +0x25d
github.com/pingcap/tidb/executor.(*ProjectionExec).Next(0xc423df3d80, 0x2b47160, 0xc4209b07b0, 0x0, 0x1, 0x1, 0xc421fd5df0, 0x1)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/executor.go:496 +0x76
github.com/pingcap/tidb/executor.(*SortExec).Next(0xc42030e3c0, 0x2b47160, 0xc4209b07b0, 0x0, 0xc420591a90, 0x19940c6, 0xc4209fd040, 0xc420d83000)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/sort.go:114 +0x4b3
github.com/pingcap/tidb/executor.(*recordSet).Next(0xc42069bb60, 0x2b47160, 0xc4209b07b0, 0xc420591b08, 0x1b16b73, 0xc42069bb60, 0xc420591b00)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/executor/adapter.go:77 +0x63
github.com/pingcap/tidb/server.(*tidbResultSet).Next(0xc422d5e8a0, 0x2b47160, 0xc4209b07b0, 0xc420591be0, 0x1b15dd0, 0x1cb0720, 0xc422d5e8a0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/server/driver_tidb.go:294 +0x47
github.com/pingcap/tidb/server.(*clientConn).writeResultset(0xc4224ca000, 0x2b47160, 0xc4209b07b0, 0x253da40, 0xc422d5e8a0, 0xc421fd0000, 0x0, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/server/conn.go:807 +0x100
github.com/pingcap/tidb/server.(*clientConn).handleQuery(0xc4224ca000, 0x2b47160, 0xc4209b07b0, 0xc420fdc001, 0x1af, 0x0, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/server/conn.go:748 +0x115
github.com/pingcap/tidb/server.(*clientConn).dispatch(0xc4224ca000, 0xc420fdc001, 0x1b0, 0x1b0, 0x0, 0x0)
	/Users/birdstorm/go/src/github.com/pingcap/tidb/server/conn.go:530 +0x511
github.com/pingcap/tidb/server.(*clientConn).Run(0xc4224ca000)
	/Users/birdstorm/go/src/github.com/pingcap/tid
2017/12/07 17:23:22.845 server.go:297: [info] [72] close connection
  1. What version of TiDB are you using (tidb-server -V)?
mysql> select tidb_version();
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v1.1.0-alpha-233-gd742d9285
Git Commit Hash: d742d9285aaa7a6489398402e22ba9a922eb893c
Git Branch: master
UTC Build Time: 2017-12-07 05:01:30 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
@zz-jason zz-jason self-assigned this Dec 7, 2017
@zz-jason
Copy link
Member

zz-jason commented Dec 7, 2017

@birdstorm Could you give me the source table schema and some data to reproduce this issue ?

@birdstorm
Copy link
Contributor Author

birdstorm commented Dec 8, 2017

@zz-jason just use original TPCH data is enough.

@zz-jason
Copy link
Member

zz-jason commented Dec 8, 2017

ok, I'll try

@zz-jason
Copy link
Member

zz-jason commented Dec 8, 2017

the plan is:

hehe

+-----------------+-----------------+-------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| id              | parents         | children                      | task | operator info                                                                                                                                                                                                 | count |
+-----------------+-----------------+-------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| TableScan_32    | Selection_33    |                               | cop  | table:nation, range:(-inf,+inf), keep order:false                                                                                                                                                             |     0 |
| Selection_33    |                 | TableScan_32                  | cop  | eq(tpch.nation.n_name, ALGERIA)                                                                                                                                                                               |     0 |
| TableReader_34  | HashLeftJoin_30 |                               | root | data:Selection_33                                                                                                                                                                                             |     0 |
| TableScan_35    |                 |                               | cop  | table:supplier, range:(-inf,+inf), keep order:false                                                                                                                                                           |     0 |
| TableReader_36  | HashLeftJoin_30 |                               | root | data:TableScan_35                                                                                                                                                                                             |     0 |
| HashLeftJoin_30 | HashLeftJoin_25 | TableReader_34,TableReader_36 | root | inner join, small:TableReader_36, equal:[eq(tpch.nation.n_nationkey, tpch.supplier.s_nationkey)]                                                                                                              |     0 |
| IndexScan_49    |                 |                               | cop  | table:partsupp, index:PS_PARTKEY, PS_SUPPKEY, range:[<nil>,+inf], out of order:false                                                                                                                          |     0 |
| TableScan_50    |                 |                               | cop  | table:partsupp, keep order:false                                                                                                                                                                              |     0 |
| IndexLookUp_51  | MergeJoin_42    |                               | root | index:IndexScan_49, table:TableScan_50                                                                                                                                                                        |     0 |
| TableScan_52    | Selection_53    |                               | cop  | table:part, range:(-inf,+inf), keep order:true                                                                                                                                                                |     0 |
| Selection_53    |                 | TableScan_52                  | cop  | like(tpch.part.p_name, green%, 92)                                                                                                                                                                            |     0 |
| TableReader_54  | MergeJoin_42    |                               | root | data:Selection_53                                                                                                                                                                                             |     0 |
| MergeJoin_42    | HashLeftJoin_41 | IndexLookUp_51,TableReader_54 | root | semi join, equal:[eq(tpch.partsupp.ps_partkey, tpch.part.p_partkey)], left key:tpch.partsupp.ps_partkey, right key:tpch.part.p_partkey                                                                        |     0 |
| TableScan_60    | Selection_61    |                               | cop  | table:lineitem, range:(-inf,+inf), keep order:false                                                                                                                                                           |     0 |
| Selection_61    |                 | TableScan_60                  | cop  | ge(tpch.lineitem.l_shipdate, 1993-01-01 00:00:00.000000), lt(tpch.lineitem.l_shipdate, 1994-01-01)                                                                                                            |     0 |
| TableReader_62  | HashLeftJoin_41 |                               | root | data:Selection_61                                                                                                                                                                                             |     0 |
| HashLeftJoin_41 | HashAgg_39      | MergeJoin_42,TableReader_62   | root | left outer join, small:TableReader_62, equal:[eq(tpch.partsupp.ps_partkey, tpch.lineitem.l_partkey) eq(tpch.partsupp.ps_suppkey, tpch.lineitem.l_suppkey)]                                                    |     0 |
| HashAgg_39      | Selection_38    | HashLeftJoin_41               | root | group by:tpch.partsupp.ps_partkey, tpch.partsupp.ps_suppkey, funcs:firstrow(tpch.partsupp.ps_partkey), firstrow(tpch.partsupp.ps_suppkey), firstrow(tpch.partsupp.ps_availqty), sum(tpch.lineitem.l_quantity) |     1 |
| Selection_38    | Projection_37   | HashAgg_39                    | root | gt(cast(tpch.partsupp.ps_availqty), mul(0.5, 13_col_0))                                                                                                                                                       |   0.8 |
| Projection_37   | HashLeftJoin_25 | Selection_38                  | root | tpch.partsupp.ps_partkey, tpch.partsupp.ps_suppkey, tpch.partsupp.ps_availqty, mul(0.5, 13_col_0)                                                                                                             |   0.8 |
| HashLeftJoin_25 | Projection_24   | HashLeftJoin_30,Projection_37 | root | semi join, small:Projection_37, equal:[eq(tpch.supplier.s_suppkey, tpch.partsupp.ps_suppkey)]                                                                                                                 |     0 |
| Projection_24   | Sort_23         | HashLeftJoin_25               | root | tpch.supplier.s_name, tpch.supplier.s_address                                                                                                                                                                 |     0 |
| Sort_23         |                 | Projection_24                 | root | tpch.supplier.s_name:asc                                                                                                                                                                                      |     0 |
+-----------------+-----------------+-------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

the panic happened during the execution of sum(tpch.lineitem.l_quantity) in operator HashAgg_39

@zz-jason
Copy link
Member

zz-jason commented Dec 8, 2017

from the log I printed, PhysicalHashJoin.DefaultValues may be wrong:

10382 2017/12/08 19:10:08.637 join_result_generators.go:317: [error] *executor.leftOuterJoinResultGenerator.emitMatchedInners: len(outer)=4, len(inner)=4, len(buffer)=8
10383
10384 2017/12/08 19:10:08.637 join_result_generators.go:325: [error] *executor.leftOuterJoinResultGenerator.emitUnMatchedOuter: len(outer)=4, len(outputer.defaultInner)=1
10385
10386 2017/12/08 19:10:08.637 aggregation.go:259: [error] *aggregation.aggFunction(0xc420a16300).updateSum: a=&expression.Column{FromID:11, ColName:model.CIStr{O:"L_QUANTITY", L:"l_quantity"}, DBName:model.CIStr{O:"tpch", L:"tpch"}, OrigTblName:model.CIStr{O:"lineitem", L:"lineitem"}, TblName:model.CIStr{O:"lineitem", L:"lineitem"}, RetType:(*types.FieldType)(0xc4202a2f50), ID:5, Position:4, IsAggOrSubq:false, Index:5, hashcode:[]uint8{0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xb, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4}}, row.Len()=5

@zz-jason zz-jason added the type/bug The issue is confirmed as a bug. label Dec 10, 2017
@zz-jason zz-jason added the priority/P1 The issue has P1 priority. label Dec 11, 2017
@zz-jason
Copy link
Member

this issue is caused by decorrelateSolver, after replace inner plan for LogicalApply, we forget to maintain the constraint that len(innerPlan.Schema().Len()) == len(p.DefaultValues)

@zz-jason
Copy link
Member

135         } else if agg, ok := innerPlan.(*LogicalAggregation); ok {
136             if apply.canPullUpAgg() && agg.canPullUp() {
137                 innerPlan = agg.children[0].(LogicalPlan)
138                 apply.JoinType = LeftOuterJoin
139                 setParentAndChildren(apply, outerPlan, innerPlan)
140                 agg.SetSchema(apply.Schema())
141                 agg.GroupByItems = expression.Column2Exprs(outerPlan.Schema().Keys[0])
142                 newAggFuncs := make([]aggregation.Aggregation, 0, apply.Schema().Len())
143                 for _, col := range outerPlan.Schema().Columns {
144                     first := aggregation.NewAggFunction(ast.AggFuncFirstRow, []expression.Expression{col}, false)
145                     newAggFuncs = append(newAggFuncs, first)
146                 }
147                 newAggFuncs = append(newAggFuncs, agg.AggFuncs...)
148                 agg.AggFuncs = newAggFuncs
149                 apply.SetSchema(expression.MergeSchema(outerPlan.Schema(), innerPlan.Schema()))
150                 agg.SetParents(apply.Parents()...)
151                 np, err := s.optimize(p, nil)
152                 if err != nil {
153                     return nil, errors.Trace(err)
154                 }
155                 setParentAndChildren(agg, np)
156                 agg.collectGroupByColumns()
157                 return agg, nil
158             }
159         }

Here agg.Schema().Len() is 1 and len(apply.DefaultValues) is 1, but agg.children[0].Schema().Len() is 4 and we can not just replace the inner plan of LogicalApply, we have to maintain the constraint that len(innerPlan.Schema().Len()) == len(p.DefaultValues)

Maybe there still exists some scene that break the constraint, I'll fix this issue firstly anyway.

@zz-jason
Copy link
Member

zz-jason commented Dec 11, 2017

Maybe a better way is to calculate DefaultValues for join after all the logical and physical optimization, so that we don't need to maintain the constraint, i.e. len(innerPlan.Schema().Len()) == len(p.DefaultValues), everywhere.

@winoros
Copy link
Member

winoros commented Dec 11, 2017

#5278
This issue seems a same problem.

@winoros
Copy link
Member

winoros commented Dec 11, 2017

Currently, unless we push aggregation across the join, the default values of it will alway be all NULL.
And we cannot recognize one aggregation is pushed acorss the join so it's not easy to decide the default value when all done.

@zz-jason
Copy link
Member

@winoros aggregation push down is disabled currently.

239 // NewSessionVars creates a session vars object.
240 func NewSessionVars() *SessionVars {
241     return &SessionVars{
242         Users:                      make(map[string]string),
243         Systems:                    make(map[string]string),
244         PreparedStmts:              make(map[uint32]interface{}),
245         PreparedStmtNameToID:       make(map[string]uint32),
246         PreparedParams:             make([]interface{}, 10),
247         TxnCtx:                     &TransactionContext{},
248         RetryInfo:                  &RetryInfo{},
249         StrictSQLMode:              true,
250         Status:                     mysql.ServerStatusAutocommit,
251         StmtCtx:                    new(stmtctx.StatementContext),
252         AllowAggPushDown:           false,
253         BuildStatsConcurrencyVar:   DefBuildStatsConcurrency,
254         IndexJoinBatchSize:         DefIndexJoinBatchSize,
255         IndexLookupSize:            DefIndexLookupSize,
256         IndexLookupConcurrency:     DefIndexLookupConcurrency,
257         IndexSerialScanConcurrency: DefIndexSerialScanConcurrency,
258         DistSQLScanConcurrency:     DefDistSQLScanConcurrency,
259         MaxRowCountForINLJ:         DefMaxRowCountForINLJ,
260         MaxChunkSize:               DefMaxChunkSize,
261     }
262 }

@winoros
Copy link
Member

winoros commented Dec 11, 2017

Yes, but if we enable it and we can enable it in client in fact. The default value cannot be decided if we don't decide it when pushing agg down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P1 The issue has P1 priority. type/bug The issue is confirmed as a bug.
Projects
None yet
3 participants