Skip to content

Commit

Permalink
[SPARK-36384][CORE][DOC] Add doc for shuffle checksum
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Add doc for the shuffle checksum configs in `configuration.md`.

### Why are the changes needed?

doc

### Does this PR introduce _any_ user-facing change?

No, since Spark 3.2 hasn't been released.

### How was this patch tested?

Pass existed tests.

Closes apache#33637 from Ngone51/SPARK-36384.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
Ngone51 authored and HyukjinKwon committed Aug 5, 2021
1 parent 0f5c3a4 commit 3b92c72
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1370,21 +1370,24 @@ package object config {

private[spark] val SHUFFLE_CHECKSUM_ENABLED =
ConfigBuilder("spark.shuffle.checksum.enabled")
.doc("Whether to calculate the checksum of shuffle output. If enabled, Spark will try " +
"its best to tell if shuffle data corruption is caused by network or disk or others.")
.doc("Whether to calculate the checksum of shuffle data. If enabled, Spark will calculate " +
"the checksum values for each partition data within the map output file and store the " +
"values in a checksum file on the disk. When there's shuffle data corruption detected, " +
"Spark will try to diagnose the cause (e.g., network issue, disk issue, etc.) of the " +
"corruption by using the checksum file.")
.version("3.2.0")
.booleanConf
.createWithDefault(true)

private[spark] val SHUFFLE_CHECKSUM_ALGORITHM =
ConfigBuilder("spark.shuffle.checksum.algorithm")
.doc("The algorithm used to calculate the checksum. Currently, it only supports" +
" built-in algorithms of JDK.")
.doc("The algorithm is used to calculate the shuffle checksum. Currently, it only supports " +
"built-in algorithms of JDK.")
.version("3.2.0")
.stringConf
.transform(_.toUpperCase(Locale.ROOT))
.checkValue(Set("ADLER32", "CRC32").contains, "Shuffle checksum algorithm " +
"should be either Adler32 or CRC32.")
"should be either ADLER32 or CRC32.")
.createWithDefault("ADLER32")

private[spark] val SHUFFLE_COMPRESS =
Expand Down
18 changes: 18 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1032,6 +1032,24 @@ Apart from these, the following properties are also available, and may be useful
</td>
<td>1.6.0</td>
</tr>
<tr>
<td><code>spark.shuffle.checksum.enabled</code></td>
<td>true</td>
<td>
Whether to calculate the checksum of shuffle data. If enabled, Spark will calculate the checksum values for each partition
data within the map output file and store the values in a checksum file on the disk. When there's shuffle data corruption
detected, Spark will try to diagnose the cause (e.g., network issue, disk issue, etc.) of the corruption by using the checksum file.
</td>
<td>3.2.0</td>
</tr>
<tr>
<td><code>spark.shuffle.checksum.algorithm</code></td>
<td>ADLER32</td>
<td>
The algorithm is used to calculate the shuffle checksum. Currently, it only supports built-in algorithms of JDK, e.g., ADLER32, CRC32.
</td>
<td>3.2.0</td>
</tr>
</table>

### Spark UI
Expand Down

0 comments on commit 3b92c72

Please sign in to comment.