[SPARK-48854][DOCS] Add missing options in CSV documentation

yaooqinn · jingz-db · commit f3804c035c96 · 2024-07-22T10:06:24.000-07:00
### What changes were proposed in this pull request? This PR added documents for missing CSV options, including `delimiter` as an alternative to `sep`, `charset` as an alternative to `encoding`, `codec` as an alternative to `compression`, and `timeZone`, excluding `columnPruning` which falls back to an internal SQL config. ### Why are the changes needed? improvement for user guide ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ![image](https://github.com/apache/spark/assets/8326978/d8ff888b-cafa-44e6-ab74-7bf69702a267) ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47278 from yaooqinn/SPARK-48854. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
diff --git a/docs/sql-data-sources-csv.md b/docs/sql-data-sources-csv.md
@@ -55,13 +55,13 @@ Data source options of CSV can be set via:
 <table>
   <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
   <tr>
-    <td><code>sep</code></td>
+    <td><code>sep</code><br><code>delimiter</code></td>
     <td>,</td>
     <td>Sets a separator for each field and value. This separator can be one or more characters.</td>
     <td>read/write</td>
   </tr>
   <tr>
-    <td><code>encoding</code></td>
+    <td><code>encoding</code><br><code>charset</code></td>
     <td>UTF-8</td>
     <td>For reading, decodes the CSV files by the given encoding type. For writing, specifies encoding (charset) of saved CSV files. CSV built-in functions ignore this option.</td>
     <td>read/write</td>
@@ -261,10 +261,22 @@ Data source options of CSV can be set via:
     <td>read</td>
   </tr>
   <tr>
-    <td><code>compression</code></td>
+    <td><code>compression</code><br><code>codec</code></td>
     <td>(none)</td>
     <td>Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (<code>none</code>, <code>bzip2</code>, <code>gzip</code>, <code>lz4</code>, <code>snappy</code> and <code>deflate</code>). CSV built-in functions ignore this option.</td>
     <td>write</td>
   </tr>
+  <tr>
+    <td><code>timeZone</code></td>
+    <td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
+    <td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>
+    <ul>
+      <li>Region-based zone ID: It should have the form 'area/city', such as 'America/Los_Angeles'.</li>
+      <li>Zone offset: It should be in the format '(+|-)HH:mm', for example '-08:00' or '+01:00'. Also 'UTC' and 'Z' are supported as aliases of '+00:00'.</li>
+    </ul>
+    Other short names like 'CST' are not recommended to use because they can be ambiguous.
+    </td>
+    <td>read/write</td>
+  </tr>
 </table>
 Other generic options can be found in <a href="https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html">Generic File Source Options</a>.
diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md
@@ -112,7 +112,6 @@ Data source options of JSON can be set via:
 <table>
   <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead>
   <tr>
-    <!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too. -->
     <td><code>timeZone</code></td>
     <td>(value of <code>spark.sql.session.timeZone</code> configuration)</td>
     <td>Sets the string that indicates a time zone ID to be used to format timestamps in the JSON datasources or partition values. The following formats of <code>timeZone</code> are supported:<br>