Skip to content

Conversation

@zedtang
Copy link
Contributor

@zedtang zedtang commented Jul 22, 2024

What changes were proposed in this pull request?

Support listColumns API for clustering columns.

Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.

Does this PR introduce any user-facing change?

Yes, listColumns will now show an additional field isCluster to indicate whether the column is a clustering column.
Old output for spark.catalog.listColumns:

+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+

New output:

+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@zedtang
Copy link
Contributor Author

zedtang commented Jul 22, 2024

This PR depends on #47301

@zedtang zedtang force-pushed the list-clustering-columns branch from d583e11 to 13a661b Compare July 25, 2024 05:00
@github-actions github-actions bot removed the BUILD label Jul 25, 2024
@zedtang
Copy link
Contributor Author

zedtang commented Jul 25, 2024

Hi @cloud-fan , this PR is ready for review, thanks

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e73ede7 Jul 26, 2024
@zedtang zedtang deleted the list-clustering-columns branch July 26, 2024 02:00
ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
### What changes were proposed in this pull request?

Support listColumns API for clustering columns.
### Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.
### Does this PR introduce _any_ user-facing change?

Yes, listColumns will now show an additional field `isCluster` to indicate whether the column is a clustering column.
Old output for `spark.catalog.listColumns`:
```
+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+
```

New output:
```
+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+
```

### How was this patch tested?

New unit tests.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47451 from zedtang/list-clustering-columns.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
lwz9103 pushed a commit to Kyligence/spark that referenced this pull request Mar 27, 2025
Support listColumns API for clustering columns.

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.

Yes, listColumns will now show an additional field `isCluster` to indicate whether the column is a clustering column.
Old output for `spark.catalog.listColumns`:
```
+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+
```

New output:
```
+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+
```

New unit tests.

No.

Closes apache#47451 from zedtang/list-clustering-columns.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
lwz9103 pushed a commit to Kyligence/spark that referenced this pull request Apr 22, 2025
Support listColumns API for clustering columns.

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.

Yes, listColumns will now show an additional field `isCluster` to indicate whether the column is a clustering column.
Old output for `spark.catalog.listColumns`:
```
+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+
```

New output:
```
+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+
```

New unit tests.

No.

Closes apache#47451 from zedtang/list-clustering-columns.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants