Skip to content

Bug: incorrectly ignores OFFSET clause (Spark 3.4+) #1739

@yew1eb

Description

@yew1eb

Since Apache Spark 3.4 (SPARK-28330), the physical plan fully represents OFFSET:

  • GlobalLimitExec and TakeOrderedAndProjectExec carries an explicit offset field
  • When both LIMIT and OFFSET are present, Spark stores limit + offset as the raw limit value

The current auron completely ignores the offset field.
As a result, the query:

SELECT * FROM t LIMIT 10 OFFSET 5

return the first 15 rows instead of rows 6–15, producing incorrect results.

We should fully support the LIMIT … OFFSET … semantics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions