-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-23029] [DOCS] Specifying default units of configuration entries #20269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -419,7 +419,7 @@ package object config { | |||
|
|||
private[spark] val SHUFFLE_FILE_BUFFER_SIZE = | |||
ConfigBuilder("spark.shuffle.file.buffer") | |||
.doc("Size of the in-memory buffer for each shuffle file output stream. " + | |||
.doc("Size (in KiB) of the in-memory buffer for each shuffle file output stream. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really, "in KiB unless otherwise specified"?
Same for the next property below. These two are the only two that aren't in bytes by default, and have a description already. It would be handy to add a blurb about this to all of the "MiB" default properties above this too, for consistency.
docs/configuration.md
Outdated
@@ -58,6 +58,8 @@ The following format is accepted: | |||
1t or 1tb (tebibytes = 1024 gibibytes) | |||
1p or 1pb (pebibytes = 1024 tebibytes) | |||
|
|||
Without specification the unit depends on the configuration entry where KiB are typically assumed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at the properties that use bytesConf(), there are as many in MiB. And, really the default is just bytes unless otherwise specified. If you say anything here, maybe just
"While numbers without units are generally interpreted as bytes, a few are interpreted as KiB or MiB when no units are specified, for historical reasons. See documentation of individual configuration properties. Specifying units is desirable where possible."
docs/configuration.md
Outdated
@@ -150,6 +152,7 @@ of the most common options to set are: | |||
<td> | |||
Amount of memory to use for the driver process, i.e. where SparkContext is initialized. | |||
(e.g. <code>1g</code>, <code>2g</code>). | |||
Default unit: MiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere the default isn't bytes, a clause like ", in MiB unless otherwise specified", seems cleanest. There are 9 such properties as far as I can tell.
Although it would be complete to say "in bytes" for all other properties, probably not necessary.
Hi. Thanks for your review. Sounds good, I will go around and add a "unit blurb" to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this clarification and standardization.
Test build #4054 has finished for PR 20269 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
retest this please |
Test build #4057 has started for PR 20269 at commit |
Test build #4058 has finished for PR 20269 at commit
|
Can one of the admins verify this patch? |
## What changes were proposed in this pull request? This PR completes the docs, specifying the default units assumed in configuration entries of type size. This is crucial since unit-less values are accepted and the user might assume the base unit is bytes, which in most cases it is not, leading to hard-to-debug problems. ## How was this patch tested? This patch updates only documentation only. Author: Fernando Pereira <fernando.pereira@epfl.ch> Closes #20269 from ferdonline/docs_units. (cherry picked from commit 9678941) Signed-off-by: Sean Owen <sowen@cloudera.com>
Merged to master/2.3 |
.bytesConf(ByteUnit.MiB) | ||
.createWithDefaultString("1g") | ||
|
||
private[spark] val DRIVER_MEMORY_OVERHEAD = ConfigBuilder("spark.driver.memoryOverhead") | ||
.doc("The amount of off-heap memory to be allocated per driver in cluster mode, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ferdonline , can you explain why this is off-heap memory ?
What changes were proposed in this pull request?
This PR completes the docs, specifying the default units assumed in configuration entries of type size.
This is crucial since unit-less values are accepted and the user might assume the base unit is bytes, which in most cases it is not, leading to hard-to-debug problems.
How was this patch tested?
This patch updates only documentation only.