Skip to content

Commit f3804c0

Browse files
yaooqinnjingz-db
authored andcommitted
[SPARK-48854][DOCS] Add missing options in CSV documentation
### What changes were proposed in this pull request? This PR added documents for missing CSV options, including `delimiter` as an alternative to `sep`, `charset` as an alternative to `encoding`, `codec` as an alternative to `compression`, and `timeZone`, excluding `columnPruning` which falls back to an internal SQL config. ### Why are the changes needed? improvement for user guide ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ![image](https://github.com/apache/spark/assets/8326978/d8ff888b-cafa-44e6-ab74-7bf69702a267) ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47278 from yaooqinn/SPARK-48854. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
1 parent ceb9dc5 commit f3804c0

File tree

2 files changed

+15
-4
lines changed

2 files changed

+15
-4
lines changed

docs/sql-data-sources-csv.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,13 @@ Data source options of CSV can be set via:
5555
<table>
5656
<thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
5757
<tr>
58-
<td><code>sep</code></td>
58+
<td><code>sep</code><br><code>delimiter</code></td>
5959
<td>,</td>
6060
<td>Sets a separator for each field and value. This separator can be one or more characters.</td>
6161
<td>read/write</td>
6262
</tr>
6363
<tr>
64-
<td><code>encoding</code></td>
64+
<td><code>encoding</code><br><code>charset</code></td>
6565
<td>UTF-8</td>
6666
<td>For reading, decodes the CSV files by the given encoding type. For writing, specifies encoding (charset) of saved CSV files. CSV built-in functions ignore this option.</td>
6767
<td>read/write</td>
@@ -261,10 +261,22 @@ Data source options of CSV can be set via:
261261
<td>read</td>
262262
</tr>
263263
<tr>
264-
<td><code>compression</code></td>
264+
<td><code>compression</code><br><code>codec</code></td>
265265
<td>(none)</td>
266266
<td>Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (<code>none</code>, <code>bzip2</code>, <code>gzip</code>, <code>lz4</code>, <code>snappy</code> and <code>deflate</code>). CSV built-in functions ignore this option.</td>
267267
<td>write</td>
268268
</tr>
269+
<tr>
270+
<td><code>timeZone</code></td>
271+
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
272+
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
273+
<ul>
274+
<li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
275+
<li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
276+
</ul>
277+
Other short names like 'CST' are not recommended to use because they can be ambiguous.
278+
</td>
279+
<td>read/write</td>
280+
</tr>
269281
</table>
270282
Other generic options can be found in <a href="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html">Generic File Source Options</a>.

docs/sql-data-sources-json.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,6 @@ Data source options of JSON can be set via:
112112
<table>
113113
<thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
114114
<tr>
115-
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too. -->
116115
<td><code>timeZone</code></td>
117116
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
118117
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>

0 commit comments

Comments
 (0)