Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: regression index merge UNION case from v7.3.0 to v7.4.0 and master #48588

Closed
AilinKid opened this issue Nov 14, 2023 · 1 comment · Fixed by #48590
Closed

planner: regression index merge UNION case from v7.3.0 to v7.4.0 and master #48588

AilinKid opened this issue Nov 14, 2023 · 1 comment · Fixed by #48590
Assignees
Labels

Comments

@AilinKid
Copy link
Contributor

AilinKid commented Nov 14, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `aid` bigint(20) DEFAULT NULL,
  `c1` varchar(255) DEFAULT NULL,
  `c2` varchar(255) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
  KEY `aid_c1` (`aid`,`c1`),
  KEY `aid_c2` (`aid`,`c2`)
) 

2. What did you expect to see? (Required)

MySQL [test]> explain select /*+ USE_INDEX_MERGE(t, aid_c1, aid_c2) */ * from t where (aid = 1 and c1='aaa') or (aid = 1 and c2='bbb') limit 1;
+--------------------------------+---------+-----------+--------------------------------+---------------------------------------------------------+
| id                             | estRows | task      | access object                  | operator info                                           |
+--------------------------------+---------+-----------+--------------------------------+---------------------------------------------------------+
| IndexMerge_20                  | 1.00    | root      |                                | type: union, limit embedded(offset:0, count:1)          |
| ├─Limit_18(Build)              | 0.01    | cop[tikv] |                                | offset:0, count:1                                       |
| │ └─IndexRangeScan_11          | 0.01    | cop[tikv] | table:t, index:aid_c1(aid, c1) | range:[1 "aaa",1 "aaa"], keep order:false, stats:pseudo |
| ├─Limit_19(Build)              | 0.01    | cop[tikv] |                                | offset:0, count:1                                       |
| │ └─IndexRangeScan_12          | 0.01    | cop[tikv] | table:t, index:aid_c2(aid, c2) | range:[1 "bbb",1 "bbb"], keep order:false, stats:pseudo |
| └─TableRowIDScan_13(Probe)     | 1.00    | cop[tikv] | table:t                        | keep order:false, stats:pseudo                          |
+--------------------------------+---------+-----------+--------------------------------+---------------------------------------------------------+
6 rows in set (0.001 sec)

3. What did you see instead (Required)

MySQL [test]> explain select /*+ USE_INDEX_MERGE(t, aid_c1, aid_c2) */ * from t where (aid = 1 and c1='aaa') or (aid = 1 and c2='bbb') limit 1;
+----------------------------------+---------+-----------+--------------------------------+---------------------------------------------------------+
| id                               | estRows | task      | access object                  | operator info                                           |
+----------------------------------+---------+-----------+--------------------------------+---------------------------------------------------------+
| Limit_10                         | 1.00    | root      |                                | offset:0, count:1                                       |
| └─IndexMerge_23                  | 1.00    | root      |                                | type: union                                             |
|   ├─IndexRangeScan_20(Build)     | 0.01    | cop[tikv] | table:t, index:aid_c1(aid, c1) | range:[1 "aaa",1 "aaa"], keep order:false, stats:pseudo |
|   ├─IndexRangeScan_21(Build)     | 0.01    | cop[tikv] | table:t, index:aid_c2(aid, c2) | range:[1 "bbb",1 "bbb"], keep order:false, stats:pseudo |
|   └─TableRowIDScan_22(Probe)     | 1.00    | cop[tikv] | table:t                        | keep order:false, stats:pseudo                          |
+----------------------------------+---------+-----------+--------------------------------+---------------------------------------------------------+
5 rows in set (0.001 sec)

4. What is your TiDB version? (Required)

@AilinKid AilinKid added the type/bug The issue is confirmed as a bug. label Nov 14, 2023
@AilinKid AilinKid self-assigned this Nov 14, 2023
@AilinKid
Copy link
Contributor Author

AilinKid commented Nov 14, 2023

#46619 in this pr we lift the limitation of generate valid cop task for intersection index merge case.

Explaination:

First:

condition (aid = 1 and c1='aaa') or (aid = 1 and c2='bbb') can be seen as AND condition combination after we fillIndexPath is done:
index(aid, c2) with access condition: (aid=1) or (aid=1 and c2='bbb') , with entire OR tree as its table filter
index(aid, c1) with access condition: (aid=1 and c1='aaa') or (aid), with entire OR tree as its table filter

this kind of condition matching and converging into different index path is done automatically by fillIndexPath, not from manually cnf and dnf convertion

Note that, the final Index Merge is AND/Intersection mode. The alternative path for OR/UNION is also generated.
Let's identify them and COP-Intersection and COP-Union for short here.

Second:

the cost compare problem, in the function convertToIndexMergeScan in previous pr #46619, we lift the limitation of generating COP-TYPE index merge, its is usefully for we to sink limit into it before encapsulating it as a root task. While in the cost compare function getTaskPlanCost of compareTaskCost, seems we lost the logic of handling COP-TYPE index merge operator's cost collection. for root index merge reader, we actually did.

func getTaskPlanCost(t task, op *physicalOptimizeOp) (float64, bool, error) {
	if t.invalid() {
		return math.MaxFloat64, true, nil
	}

	// use the new cost interface
	var taskType property.TaskType
	switch t.(type) {
	case *rootTask:
		taskType = property.RootTaskType
	case *copTask: // no need to know whether the task is single-read or double-read, so both CopSingleReadTaskType and CopDoubleReadTaskType are OK
		cop := t.(*copTask)
		if cop.indexPlan != nil && cop.tablePlan != nil { // handle IndexLookup specially
			taskType = property.CopMultiReadTaskType
			// keep compatible with the old cost interface, for CopMultiReadTask, the cost is idxCost + tblCost.
			if !cop.indexPlanFinished { // only consider index cost in this case
				idxCost, err := getPlanCost(cop.indexPlan, taskType, NewDefaultPlanCostOption().WithOptimizeTracer(op))
				return idxCost, false, err
			}
			// consider both sides
			idxCost, err := getPlanCost(cop.indexPlan, taskType, NewDefaultPlanCostOption().WithOptimizeTracer(op))
			if err != nil {
				return 0, false, err
			}
			tblCost, err := getPlanCost(cop.tablePlan, taskType, NewDefaultPlanCostOption().WithOptimizeTracer(op))
			if err != nil {
				return 0, false, err
			}
			return idxCost + tblCost, false, nil
		}

		taskType = property.CopSingleReadTaskType

		// TiFlash can run cop task as well, check whether this cop task will run on TiKV or TiFlash.
		if cop.tablePlan != nil {
			leafNode := cop.tablePlan
			for len(leafNode.Children()) > 0 {
				leafNode = leafNode.Children()[0]
			}
			if tblScan, isScan := leafNode.(*PhysicalTableScan); isScan && tblScan.StoreType == kv.TiFlash {
				taskType = property.MppTaskType
			}
		}
	case *mppTask:
		taskType = property.MppTaskType
	default:
		return 0, false, errors.New("unknown task type")
	}
///////////////////////////// BUG1 ////////////////////////////////////////////
	if t.plan() == nil {
		// It's a very special case for index merge case.
		cost := 0.0
		copTsk := t.(*copTask)
		for _, partialScan := range copTsk.idxMergePartPlans {
			partialCost, err := getPlanCost(partialScan, taskType, NewDefaultPlanCostOption().WithOptimizeTracer(op))
			if err != nil {
				return 0, false, err
			}
			cost += partialCost
		}
		return cost, false, nil
	}
///////////////////////////// BUG2 ////////////////////////////////////////////
	cost, err := getPlanCost(t.plan(), taskType, NewDefaultPlanCostOption().WithOptimizeTracer(op))
	return cost, false, err
}

COP-Intersection's table side is not pure table scan, it has selection of entire OR tree, so we mark indexPlanFinished as true in convertToIndexMergeScan, kind of tricky code here. And in the comment the second bug line above, it only collected the cost from t.plan(). When the indexPlanFinished is true in current cop task, t.plan() only return table plan, leading missing the collection of index merge partial plan.

On the contrary, COP-Union's table side is pure table scan, so we mark indexPlanFinished as false in convertToIndexMergeScan, And in the comment the first bug line above, it only collected the cost from t.plan(). When the indexPlanFinished is true in current cop task, t.plan() only return index plan which is nil in index merge, consequently walking through the branch code, and only collecting the partial index plans cost.

The direct result will cause COP-Intersection win the COP-UNION in current ds.convertToIndexMergeScan()

Third:

so for COP type, the datasource only memorize the COP-Intersection as it's best, for ROOT type, the datasource give the plan which has limit append on the index merge reader which is seen in the non-expected plan shown in the pr description above.

@AilinKid AilinKid changed the title planner: downgrade index merge UNION case from v7.3.0 to v7.4.0 and master planner: regression index merge UNION case from v7.3.0 to v7.4.0 and master Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants