Skip to content

[SPARK-48854][DOCS] Add missing options in CSV documentation #47278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions docs/sql-data-sources-csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,13 @@ Data source options of CSV can be set via:
<table>
<thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
<tr>
<td><code>sep</code></td>
<td><code>sep</code><br><code>delimiter</code></td>
<td>,</td>
<td>Sets a separator for each field and value. This separator can be one or more characters.</td>
<td>read/write</td>
</tr>
<tr>
<td><code>encoding</code></td>
<td><code>encoding</code><br><code>charset</code></td>
<td>UTF-8</td>
<td>For reading, decodes the CSV files by the given encoding type. For writing, specifies encoding (charset) of saved CSV files. CSV built-in functions ignore this option.</td>
<td>read/write</td>
Expand Down Expand Up @@ -261,10 +261,22 @@ Data source options of CSV can be set via:
<td>read</td>
</tr>
<tr>
<td><code>compression</code></td>
<td><code>compression</code><br><code>codec</code></td>
<td>(none)</td>
<td>Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (<code>none</code>, <code>bzip2</code>, <code>gzip</code>, <code>lz4</code>, <code>snappy</code> and <code>deflate</code>). CSV built-in functions ignore this option.</td>
<td>write</td>
</tr>
<tr>
<td><code>timeZone</code></td>
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
<ul>
<li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
<li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
</ul>
Other short names like 'CST' are not recommended to use because they can be ambiguous.
</td>
<td>read/write</td>
</tr>
</table>
Other generic options can be found in <a href="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html">Generic File Source Options</a>.
1 change: 0 additions & 1 deletion docs/sql-data-sources-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,6 @@ Data source options of JSON can be set via:
<table>
<thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
<tr>
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too. -->
<td><code>timeZone</code></td>
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
Expand Down