You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-48854][DOCS] Add missing options in CSV documentation
### What changes were proposed in this pull request?
This PR added documents for missing CSV options, including `delimiter` as an alternative to `sep`, `charset` as an alternative to `encoding`, `codec` as an alternative to `compression`, and `timeZone`, excluding `columnPruning` which falls back to an internal SQL config.
### Why are the changes needed?
improvement for user guide
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
doc build

### Was this patch authored or co-authored using generative AI tooling?
no
Closesapache#47278 from yaooqinn/SPARK-48854.
Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
<td>For reading, decodes the CSV files by the given encoding type. For writing, specifies encoding (charset) of saved CSV files. CSV built-in functions ignore this option.</td>
67
67
<td>read/write</td>
@@ -261,10 +261,22 @@ Data source options of CSV can be set via:
<td>Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (<code>none</code>, <code>bzip2</code>, <code>gzip</code>, <code>lz4</code>, <code>snappy</code> and <code>deflate</code>). CSV built-in functions ignore this option.</td>
267
267
<td>write</td>
268
268
</tr>
269
+
<tr>
270
+
<td><code>timeZone</code></td>
271
+
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
272
+
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
273
+
<ul>
274
+
<li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
275
+
<li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
276
+
</ul>
277
+
Other short names like 'CST' are not recommended to use because they can be ambiguous.
278
+
</td>
279
+
<td>read/write</td>
280
+
</tr>
269
281
</table>
270
282
Other generic options can be found in <ahref="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html">Generic File Source Options</a>.
<!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too. -->
116
115
<td><code>timeZone</code></td>
117
116
<td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
118
117
<td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
0 commit comments