Skip to content

Support (order by / sort) for DataFrameWriteOptions #13873

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge?

DataFrameWriteOptions is missing an order by / sort by like available in SQL.

For sql we have the option to sort, for example:

You can use the WITH ORDER clause of the CREATE EXTERNAL TABLE if your data is already ordered

https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table

CREATE EXTERNAL TABLE test (
    c1  VARCHAR NOT NULL,
    c2  INT NOT NULL,
    c3  SMALLINT NOT NULL,
    c4  SMALLINT NOT NULL,
    c5  INT NOT NULL,
    c6  BIGINT NOT NULL,
    c7  SMALLINT NOT NULL,
    c8  INT NOT NULL,
    c9  BIGINT NOT NULL,
    c10 VARCHAR NOT NULL,
    c11 FLOAT NOT NULL,
    c12 DOUBLE NOT NULL,
    c13 VARCHAR NOT NULL
)
STORED AS CSV
-- this line tells DataFusion the data in the file is already ordered by (c2 ASC)
WITH ORDER (c2 ASC)
LOCATION '/path/to/aggregate_test_100.csv'
OPTIONS ('has_header' 'true');

But for writing my parquet or other format files, we don't support it.

Describe the solution you'd like

Add the sort support option for DataFrameWriteOptions

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions