[SPARK-23029] [DOCS] Specifying default units of configuration entries #20269

ferdonline · 2018-01-15T14:04:47Z

What changes were proposed in this pull request?

This PR completes the docs, specifying the default units assumed in configuration entries of type size.
This is crucial since unit-less values are accepted and the user might assume the base unit is bytes, which in most cases it is not, leading to hard-to-debug problems.

How was this patch tested?

This patch updates only documentation only.

srowen · 2018-01-15T20:22:05Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

@@ -419,7 +419,7 @@ package object config {

  private[spark] val SHUFFLE_FILE_BUFFER_SIZE =
    ConfigBuilder("spark.shuffle.file.buffer")
-      .doc("Size of the in-memory buffer for each shuffle file output stream. " +
+      .doc("Size (in KiB) of the in-memory buffer for each shuffle file output stream. " +


Really, "in KiB unless otherwise specified"?

Same for the next property below. These two are the only two that aren't in bytes by default, and have a description already. It would be handy to add a blurb about this to all of the "MiB" default properties above this too, for consistency.

srowen · 2018-01-15T20:25:18Z

docs/configuration.md

@@ -58,6 +58,8 @@ The following format is accepted:
    1t or 1tb (tebibytes = 1024 gibibytes)
    1p or 1pb (pebibytes = 1024 tebibytes)

+Without specification the unit depends on the configuration entry where KiB are typically assumed.


Just looking at the properties that use bytesConf(), there are as many in MiB. And, really the default is just bytes unless otherwise specified. If you say anything here, maybe just

"While numbers without units are generally interpreted as bytes, a few are interpreted as KiB or MiB when no units are specified, for historical reasons. See documentation of individual configuration properties. Specifying units is desirable where possible."

srowen · 2018-01-15T20:27:08Z

docs/configuration.md

@@ -150,6 +152,7 @@ of the most common options to set are:
  <td>
    Amount of memory to use for the driver process, i.e. where SparkContext is initialized.
    (e.g. <code>1g</code>, <code>2g</code>).
+    Default unit: MiB


Everywhere the default isn't bytes, a clause like ", in MiB unless otherwise specified", seems cleanest. There are 9 such properties as far as I can tell.

Although it would be complete to say "in bytes" for all other properties, probably not necessary.

ferdonline · 2018-01-16T09:39:49Z

Hi. Thanks for your review. Sounds good, I will go around and add a "unit blurb" to them.
I wrote "Default unit: X" to keep it the shortest and very obvious, but I agree to have nicer english in the html docs.

srowen

I like this clarification and standardization.

SparkQA · 2018-01-16T16:18:07Z

Test build #4054 has finished for PR 20269 at commit 9d92235.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

ferdonline · 2018-01-16T16:36:57Z

retest this please

jiangxb1987

LGTM

jiangxb1987 · 2018-01-16T18:19:35Z

retest this please

SparkQA · 2018-01-17T14:42:51Z

Test build #4057 has started for PR 20269 at commit bf8e55e.

SparkQA · 2018-01-18T03:34:29Z

Test build #4058 has finished for PR 20269 at commit bf8e55e.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2018-01-18T17:29:39Z

Can one of the admins verify this patch?

## What changes were proposed in this pull request? This PR completes the docs, specifying the default units assumed in configuration entries of type size. This is crucial since unit-less values are accepted and the user might assume the base unit is bytes, which in most cases it is not, leading to hard-to-debug problems. ## How was this patch tested? This patch updates only documentation only. Author: Fernando Pereira <fernando.pereira@epfl.ch> Closes #20269 from ferdonline/docs_units. (cherry picked from commit 9678941) Signed-off-by: Sean Owen <sowen@cloudera.com>

srowen · 2018-01-18T19:02:45Z

Merged to master/2.3

Ngone51 · 2018-12-03T03:20:17Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

    .bytesConf(ByteUnit.MiB)
    .createWithDefaultString("1g")

  private[spark] val DRIVER_MEMORY_OVERHEAD = ConfigBuilder("spark.driver.memoryOverhead")
+    .doc("The amount of off-heap memory to be allocated per driver in cluster mode, " +


Hi, @ferdonline , can you explain why this is off-heap memory ?

Specifying default units of configuration entries

889426b

srowen requested changes Jan 15, 2018

View reviewed changes

ferdonline added 2 commits January 16, 2018 13:48

Standardizing docs on size units for configuration entrie

0761c32

fix

9d92235

srowen approved these changes Jan 16, 2018

View reviewed changes

trailing space

bf8e55e

jiangxb1987 approved these changes Jan 16, 2018

View reviewed changes

asfgit closed this in 9678941 Jan 18, 2018

Ngone51 reviewed Dec 3, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-23029] [DOCS] Specifying default units of configuration entries #20269

[SPARK-23029] [DOCS] Specifying default units of configuration entries #20269

Uh oh!

ferdonline commented Jan 15, 2018

Uh oh!

srowen Jan 15, 2018

Uh oh!

srowen Jan 15, 2018

Uh oh!

srowen Jan 15, 2018

Uh oh!

ferdonline commented Jan 16, 2018

Uh oh!

srowen left a comment

Uh oh!

SparkQA commented Jan 16, 2018

Uh oh!

ferdonline commented Jan 16, 2018

Uh oh!

jiangxb1987 left a comment

Uh oh!

jiangxb1987 commented Jan 16, 2018

Uh oh!

SparkQA commented Jan 17, 2018

Uh oh!

SparkQA commented Jan 18, 2018

Uh oh!

AmplabJenkins commented Jan 18, 2018

Uh oh!

srowen commented Jan 18, 2018

Uh oh!

Ngone51 Dec 3, 2018

Uh oh!

Uh oh!

[SPARK-23029] [DOCS] Specifying default units of configuration entries #20269

[SPARK-23029] [DOCS] Specifying default units of configuration entries #20269

Uh oh!

Conversation

ferdonline commented Jan 15, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen Jan 15, 2018

Choose a reason for hiding this comment

Uh oh!

srowen Jan 15, 2018

Choose a reason for hiding this comment

Uh oh!

srowen Jan 15, 2018

Choose a reason for hiding this comment

Uh oh!

ferdonline commented Jan 16, 2018

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 16, 2018

Uh oh!

ferdonline commented Jan 16, 2018

Uh oh!

jiangxb1987 left a comment

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 commented Jan 16, 2018

Uh oh!

SparkQA commented Jan 17, 2018

Uh oh!

SparkQA commented Jan 18, 2018

Uh oh!

AmplabJenkins commented Jan 18, 2018

Uh oh!

srowen commented Jan 18, 2018

Uh oh!

Ngone51 Dec 3, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!