feat: support pivot and unpivot #17946

chenkovsky · 2025-10-06T23:10:22Z

Which issue does this PR close?

closes Add pivot & unpivot (melt) to DataFrame #12907

Rationale for this change

currently pivot and unpivot are not supported.

https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-pivot.html
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-unpivot.html

What changes are included in this PR?

support pivot and unpivot.

Are these changes tested?

UT

Are there any user-facing changes?

No

Copilot

Pull Request Overview

This PR adds PIVOT and UNPIVOT functionality to DataFusion, enabling users to transform rows into columns (PIVOT) and columns into rows (UNPIVOT) through both SQL syntax and DataFrame API methods.

Key Changes

Implements SQL parsing and logical plan building for PIVOT and UNPIVOT operations
Adds DataFrame methods pivot() and unpivot() for programmatic access
Includes comprehensive test coverage through SQL logic tests and DataFrame integration tests

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
datafusion/sqllogictest/test_files/unpivot.slt	New SQL logic tests for UNPIVOT functionality with NULL handling variations
datafusion/sqllogictest/test_files/pivot.slt	New SQL logic tests for PIVOT functionality with single and multiple columns
datafusion/sql/src/relation/mod.rs	Implements SQL-to-logical plan conversion for PIVOT and UNPIVOT table factors
datafusion/sql/src/expr/identifier.rs	Changes visibility of identifier helper methods to `pub(crate)` for use in relation module
datafusion/expr/src/logical_plan/builder.rs	Adds `pivot()` and `unpivot()` builder methods implementing core transformation logic
datafusion/core/tests/dataframe/mod.rs	Comprehensive integration tests for DataFrame pivot/unpivot APIs
datafusion/core/src/dataframe/mod.rs	Adds public `pivot()` and `unpivot()` methods to DataFrame API

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-09T16:27:15Z

datafusion/core/src/dataframe/mod.rs

+            DataFusionError::Plan("named_struct function not found".to_string())
+        })?;


Inconsistent indentation in the error handling closure. The opening brace and closure body on lines 2465-2467 should be indented consistently with the similar error handlers below on lines 2473-2475 and 2481-2483.

Suggested change

DataFusionError::Plan("named_struct function not found".to_string())

})?;

DataFusionError::Plan("named_struct function not found".to_string())

})?;

Copilot · 2025-11-09T16:27:15Z

datafusion/core/src/dataframe/mod.rs

+    ///     vec!["jan".to_string(), "feb".to_string(), "mar".to_string()],
+    ///     "month".to_string(),
+    ///     vec![(vec!["jan".to_string(), "feb".to_string(), "mar".to_string()], None)],


The documentation example is incorrect. The first parameter value_column_names should be a single value column name (e.g., vec!["value".to_string()]), not the list of columns being unpivoted. According to the implementation and test cases, the columns being unpivoted should only be in the unpivot_columns parameter.

This should be:

vec!["value".to_string()], // Name for the value column in output "month".to_string(), vec![ (vec!["jan".to_string()], Some("jan".to_string())), (vec!["feb".to_string()], Some("feb".to_string())), (vec!["mar".to_string()], Some("mar".to_string())) ],

Suggested change

/// vec!["jan".to_string(), "feb".to_string(), "mar".to_string()],

/// "month".to_string(),

/// vec![(vec!["jan".to_string(), "feb".to_string(), "mar".to_string()], None)],

/// vec!["value".to_string()], // Name for the value column in output

/// "month".to_string(),

/// vec![

/// (vec!["jan".to_string()], Some("jan".to_string())),

/// (vec!["feb".to_string()], Some("feb".to_string())),

/// (vec!["mar".to_string()], Some("mar".to_string()))

/// ],

Copilot · 2025-11-09T16:27:15Z

datafusion/sql/src/relation/mod.rs

+                                    ))
+                                })
+                                .collect::<Result<Vec<_>>>(),
+                            _ => plan_err!("Pivot default value cannot be NULL"),


The error message is misleading. The code actually checks if the expression is NOT a List literal, but the error says "Pivot default value cannot be NULL". This should be more accurate, such as "Pivot default value must be a list literal" or "Expected list literal for pivot default values".

Suggested change

_ => plan_err!("Pivot default value cannot be NULL"),

_ => plan_err!("Pivot default value must be a list literal"),

Copilot · 2025-11-09T16:27:16Z

datafusion/expr/src/logical_plan/builder.rs

+                };
+                let pivot_col_name = format!(
+                    "{}_{}",
+                    value_alias.replace("\"", "").replace("'", ""),


Using string replacements to sanitize column names is fragile and may not handle all edge cases. Consider using a more robust approach or documenting why only quotes need to be removed. For example, other special characters like backticks, brackets, or control characters might also appear in aliases.

Suggested change

value_alias.replace("\"", "").replace("'", ""),

sanitize_column_name(value_alias),

Copilot · 2025-11-09T16:27:16Z

datafusion/expr/src/logical_plan/builder.rs

+                    match condition {
+                        None => {
+                            return plan_err!("Pivot value condition cannot be empty")
+                        }
+                        Some(cond) => cond,
+                    }


This check is unreachable and unnecessary. Since we're in the value_column.len() > 1 branch (the _ case matches 2 or more), and we iterate through value_column.iter().enumerate(), the condition variable will always be Some after the loop completes. The first iteration sets it to Some, and subsequent iterations keep it as Some. This check can never fail and should be removed.

Suggested change

match condition {

None => {

return plan_err!("Pivot value condition cannot be empty")

}

Some(cond) => cond,

}

condition.unwrap()

Copilot · 2025-11-09T16:27:16Z

datafusion/expr/src/logical_plan/builder.rs

+        named_struct_fn: &Arc<ScalarUDF>,
+        make_array_fn: &Arc<ScalarUDF>,
+        get_field_fn: &Arc<ScalarUDF>,
+    ) -> Result<Self> {


Missing validation for empty input vectors. The function should validate that value_column_names and unpivot_columns are not empty, as an unpivot operation without these doesn't make semantic sense. Consider adding early validation like:

if value_column_names.is_empty() { return plan_err!("Unpivot requires at least one value column name"); } if unpivot_columns.is_empty() { return plan_err!("Unpivot requires at least one column to unpivot"); }

Suggested change

) -> Result<Self> {

) -> Result<Self> {

if value_column_names.is_empty() {

return plan_err!("Unpivot requires at least one value column name");

}

if unpivot_columns.is_empty() {

return plan_err!("Unpivot requires at least one column to unpivot");

}

Copilot · 2025-11-09T16:27:16Z

datafusion/sql/src/relation/mod.rs

+                    datafusion_common::DataFusionError::Plan(
+                        "get_field function not found".to_string(),
+                    )
+                })?;


Inconsistent indentation in the error handling closure. The opening brace on line 370 should be indented to align with the previous error handlers on lines 352 and 361 for consistency.

Suggested change

datafusion_common::DataFusionError::Plan(

"get_field function not found".to_string(),

)

})?;

datafusion_common::DataFusionError::Plan(

"get_field function not found".to_string(),

)

})?;

Copilot · 2025-11-09T16:27:17Z

datafusion/core/src/dataframe/mod.rs

+    /// let pivoted = df.pivot(
+    ///     vec![sum(col("value"))],
+    ///     vec![Column::from("category")],
+    ///     vec![col("value")],


The documentation example appears to be incorrect. The value_source parameter should contain literal values that will become column names (e.g., lit("A"), lit("B")), not col("value"). Based on the test cases and implementation, this should be:

vec![lit("A"), lit("B")]

Suggested change

/// vec![col("value")],

/// vec![lit("A"), lit("B")],

Copilot · 2025-11-09T16:27:17Z

datafusion/expr/src/logical_plan/builder.rs

+        value_column: Vec<Column>,
+        value_source: Vec<Expr>,
+        default_on_null: Option<Vec<Expr>>,
+    ) -> Result<Self> {


Missing validation for empty input vectors. The function should validate that aggregate_functions and value_source are not empty, as a pivot operation without aggregate functions or values doesn't make semantic sense. Consider adding early validation like:

if aggregate_functions.is_empty() { return plan_err!("Pivot requires at least one aggregate function"); } if value_source.is_empty() { return plan_err!("Pivot requires at least one value in value_source"); }

Suggested change

) -> Result<Self> {

) -> Result<Self> {

if aggregate_functions.is_empty() {

return plan_err!("Pivot requires at least one aggregate function");

}

if value_source.is_empty() {

return plan_err!("Pivot requires at least one value in value_source");

}

Copilot · 2025-11-09T16:27:17Z

datafusion/expr/src/logical_plan/builder.rs

+            struct_exprs.push(struct_expr);
+        }
+
+        let unpivot_array_column = "__unpivot_array";


The hardcoded column name __unpivot_array could potentially conflict with existing columns in the schema. If a user's table already has a column named __unpivot_array, this would cause a collision and unexpected behavior. Consider either:

Checking if this name exists in the schema and generating a unique name if needed

Using a UUID or other guaranteed-unique identifier

Using a more obscure prefix that's less likely to conflict (though this doesn't eliminate the risk)

Suggested change

let unpivot_array_column = "__unpivot_array";

// Generate a unique column name for the unpivot array to avoid collisions

let mut base_name = "__unpivot_array".to_string();

let mut unpivot_array_column = base_name.clone();

let mut i = 1;

let existing_names: HashSet<&str> = schema.iter().map(|(_, f)| f.name().as_str()).collect();

while existing_names.contains(unpivot_array_column.as_str()) {

unpivot_array_column = format!("{}{}", base_name, i);

i += 1;

}

chenkovsky added 7 commits July 24, 2025 21:55

pivot

b4b297e

chore: upgrade sqlparser

cc84b7e

chore: upgrade sqlparser

b988225

Merge branch 'chore/upgrade_sqlparser' into feat/pivot

a931e2c

feat: support pivot

9becd14

feat: pivot & unpivot

0b95a56

Merge branch 'main' into feat/unpivot

3341ae4

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 6, 2025

update doc

1f471f7

alamb changed the title ~~feat: support pivot and unpivot~~ feat: support pivot and unpivot in DataFrame API Oct 7, 2025

alamb changed the title ~~feat: support pivot and unpivot in DataFrame API~~ feat: support pivot and unpivot Oct 7, 2025

alamb mentioned this pull request Oct 7, 2025

Add pivot & unpivot (melt) to DataFrame #12907

Open

This was referenced Oct 8, 2025

Support JOIN pipe operator #17969

Merged

Support JOIN, PIVOT pipe operators #17365

Draft

Weijun-H requested a review from Copilot November 9, 2025 16:20

Copilot AI reviewed Nov 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support pivot and unpivot #17946

feat: support pivot and unpivot #17946

Uh oh!

chenkovsky commented Oct 6, 2025 •

edited by alamb

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Copilot AI Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		DataFusionError::Plan("named_struct function not found".to_string())
		})?;

	_ => plan_err!("Pivot default value cannot be NULL"),
	_ => plan_err!("Pivot default value must be a list literal"),

	value_alias.replace("\"", "").replace("'", ""),
	sanitize_column_name(value_alias),

-    ) -> Result<Self> {
+    ) -> Result<Self> {
+        if value_column_names.is_empty() {
+            return plan_err!("Unpivot requires at least one value column name");
+        }
+        if unpivot_columns.is_empty() {
+            return plan_err!("Unpivot requires at least one column to unpivot");
+        }

-    ) -> Result<Self> {
+    ) -> Result<Self> {
+        if aggregate_functions.is_empty() {
+            return plan_err!("Pivot requires at least one aggregate function");
+        }
+        if value_source.is_empty() {
+            return plan_err!("Pivot requires at least one value in value_source");
+        }

-        let unpivot_array_column = "__unpivot_array";
+        // Generate a unique column name for the unpivot array to avoid collisions
+        let mut base_name = "__unpivot_array".to_string();
+        let mut unpivot_array_column = base_name.clone();
+        let mut i = 1;
+        let existing_names: HashSet<&str> = schema.iter().map(|(_, f)| f.name().as_str()).collect();
+        while existing_names.contains(unpivot_array_column.as_str()) {
+            unpivot_array_column = format!("{}{}", base_name, i);
+            i += 1;
+        }

feat: support pivot and unpivot #17946

Are you sure you want to change the base?

feat: support pivot and unpivot #17946

Uh oh!

Conversation

chenkovsky commented Oct 6, 2025 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chenkovsky commented Oct 6, 2025 •

edited by alamb

Loading