Skip to content

[Performance]: Secondary index scan does not leverage sort order for ORDER BY ... LIMIT early termination #23965

@XuPeng-SH

Description

@XuPeng-SH

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

When a query uses ORDER BY on a column present in a secondary index followed by LIMIT N, MatrixOne scans all matching rows in the index before sorting, instead of leveraging the index order to return the top-N rows directly with early termination.

This means a query like SELECT id FROM t WHERE user_id = ? ORDER BY id DESC LIMIT 20 on a table with 5000 matching rows will scan all 5000 index entries, sort them, then return 20 — instead of scanning only ~20 entries from the index in reverse order.

Steps to Reproduce

CREATE TABLE test_tbl (
  id VARCHAR(64) PRIMARY KEY,
  user_id VARCHAR(64) NOT NULL,
  is_active TINYINT DEFAULT 1,
  content TEXT,
  INDEX idx_user (user_id, is_active)
);

-- Insert 5000 rows for one user
-- (use cross join with a small numbers table)

EXPLAIN ANALYZE
SELECT id, content FROM test_tbl WHERE id IN (
  SELECT id FROM test_tbl
  WHERE user_id = 'user1' AND is_active = 1
  ORDER BY id DESC LIMIT 20
) ORDER BY id DESC;

Observed: Inner Index Table Scan shows inputRows=5000 outputRows=5000 — full scan of all matching rows, then Sort with Limit 20.

Expected: Index scan should be able to iterate in PK (id) descending order and stop after finding 20 matching rows, avoiding the full scan + sort.

Describe the feature you would like

Support index-ordered scan with early termination for queries that combine:

  1. Equality predicates on secondary index prefix columns
  2. ORDER BY on the implicit PK column (appended to secondary index in storage)
  3. LIMIT N

The optimizer should recognize that the secondary index entries are physically ordered by PK within each prefix group, and scan in reverse order stopping after N rows — eliminating the need for a full scan + explicit Sort node.

Use Cases

Paginated listing queries are very common in applications:

  • SELECT ... WHERE user_id = ? ORDER BY id DESC LIMIT 20 (first page)
  • SELECT ... WHERE user_id = ? AND id < ? ORDER BY id DESC LIMIT 20 (cursor pagination)

These are the most frequent queries in our application and currently scan the entire per-user dataset regardless of the LIMIT value.

Additional information

  • MatrixOne version: 3.0.x (Docker image)
  • The workaround of using a subquery reduces the data volume passed to the Sort node (only PK values instead of full rows), but the index scan count remains the same.

Metadata

Metadata

Assignees

Labels

area/performanceseverity/s0Extreme impact: Cause the application to break down and seriously affect the use

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions