-
Notifications
You must be signed in to change notification settings - Fork 288
Description
Is there an existing issue for the same feature request?
- I have checked the existing issues.
Is your feature request related to a problem?
When a query uses ORDER BY on a column present in a secondary index followed by LIMIT N, MatrixOne scans all matching rows in the index before sorting, instead of leveraging the index order to return the top-N rows directly with early termination.
This means a query like SELECT id FROM t WHERE user_id = ? ORDER BY id DESC LIMIT 20 on a table with 5000 matching rows will scan all 5000 index entries, sort them, then return 20 — instead of scanning only ~20 entries from the index in reverse order.
Steps to Reproduce
CREATE TABLE test_tbl (
id VARCHAR(64) PRIMARY KEY,
user_id VARCHAR(64) NOT NULL,
is_active TINYINT DEFAULT 1,
content TEXT,
INDEX idx_user (user_id, is_active)
);
-- Insert 5000 rows for one user
-- (use cross join with a small numbers table)
EXPLAIN ANALYZE
SELECT id, content FROM test_tbl WHERE id IN (
SELECT id FROM test_tbl
WHERE user_id = 'user1' AND is_active = 1
ORDER BY id DESC LIMIT 20
) ORDER BY id DESC;Observed: Inner Index Table Scan shows inputRows=5000 outputRows=5000 — full scan of all matching rows, then Sort with Limit 20.
Expected: Index scan should be able to iterate in PK (id) descending order and stop after finding 20 matching rows, avoiding the full scan + sort.
Describe the feature you would like
Support index-ordered scan with early termination for queries that combine:
- Equality predicates on secondary index prefix columns
- ORDER BY on the implicit PK column (appended to secondary index in storage)
- LIMIT N
The optimizer should recognize that the secondary index entries are physically ordered by PK within each prefix group, and scan in reverse order stopping after N rows — eliminating the need for a full scan + explicit Sort node.
Use Cases
Paginated listing queries are very common in applications:
SELECT ... WHERE user_id = ? ORDER BY id DESC LIMIT 20(first page)SELECT ... WHERE user_id = ? AND id < ? ORDER BY id DESC LIMIT 20(cursor pagination)
These are the most frequent queries in our application and currently scan the entire per-user dataset regardless of the LIMIT value.
Additional information
- MatrixOne version: 3.0.x (Docker image)
- The workaround of using a subquery reduces the data volume passed to the Sort node (only PK values instead of full rows), but the index scan count remains the same.