You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`KeyGroupedPartitioning` is a [Partitioning](Partitioning.md) where rows are split across partitions based on the [partition transform expressions](#keys).
4
+
5
+
`KeyGroupedPartitioning` is a key part of [Storage-Partitioned Joins](../storage-partitioned-joins/index.md).
6
+
7
+
!!! note
8
+
Not used in any of the [built-in Spark SQL connectors](../connectors/index.md) yet.
9
+
10
+
## Creating Instance
11
+
12
+
`KeyGroupedPartitioning` takes the following to be created:
Copy file name to clipboardExpand all lines: docs/connector/Partitioning.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,14 @@ title: Partitioning
4
4
5
5
# Partitioning
6
6
7
-
`Partitioning` is an [abstraction](#contract) of [output data partitioning requirements](#implementations) (_data distribution_) of a Spark SQL connector.
7
+
`Partitioning` is an [abstraction](#contract) of [output data partitioning requirements](#implementations) (_data distribution_) of a [Spark SQL connector](index.md).
8
8
9
9
!!! note
10
10
This `Partitioning` interface for Spark SQL developers mimics the internal Catalyst [Partitioning](../physical-operators/Partitioning.md) that is converted into with the help of [DataSourcePartitioning](../physical-operators/Partitioning.md#DataSourcePartitioning).
11
11
12
12
## Contract
13
13
14
-
### <spanid="numPartitions"> Number of Partitions
14
+
###Number of Partitions { #numPartitions }
15
15
16
16
```java
17
17
int numPartitions()
@@ -21,7 +21,7 @@ Used when:
21
21
22
22
*[DataSourcePartitioning](../physical-operators/Partitioning.md#DataSourcePartitioning) is requested for the [number of partitions](../physical-operators/Partitioning.md#numPartitions)
**Storage-Partitioned Joins** (_SPJ_) are a new type of [join](../joins.md) in Spark SQL that use the existing storage layout for a partitioned join to avoid expensive shuffles (similarly to [Bucketing](../bucketing/index.md)).
4
+
5
+
!!! note
6
+
Storage-Partitioned Joins feature was added in Apache Spark 3.3.0 ([\[SPARK-37375\] Umbrella: Storage Partitioned Join (SPJ)]({{ spark.jira }}/SPARK-37375)).
7
+
8
+
Storage-Partitioned Join is meant mainly, if not exclusively, for [Spark SQL connectors](../connector/index.md) (_v2 data sources_).
9
+
10
+
Storage-Partitioned Join was proposed in this [SPIP](https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE).
11
+
12
+
Storage-Partitioned Join uses [KeyGroupedPartitioning](../connector/KeyGroupedPartitioning.md) to determine partitions.
0 commit comments