[SPARK-28794][SQL][DOC] Documentation for Create table Command #26759

PavithraRamachandran · 2019-12-04T14:07:37Z

What changes were proposed in this pull request?

Document CREATE TABLE statement in SQL Reference Guide.

Why are the changes needed?

Adding documentation for SQL reference.

Does this PR introduce any user-facing change?

yes

Before:
There was no documentation for this.

How was this patch tested?

Used jekyll build and serve to verify.

srowen · 2019-12-04T23:57:46Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>USING datasource</em></code></dt>
+  <dd>Datasource using which the table is created.Data source can be CSV, TXT, ORC, JDBC,PARQUET, etc.</dd>


This needs some proofreading, here and below. Space needs to follow punctuation.
"Data source" needs to be consistent and refer to the argument above.
"using which the table is created" -> used to create the table.
But can you say any more about what this means? This isn't really adding much documentation.

srowen · 2019-12-09T17:27:40Z

This still has a lot of basic syntax, grammar and formatting problems. Please proofread per above.

srowen

I think the docs need to have a little more, well, documentation here. It's just repeating what the syntax implies already. It doesn't need to be super in depth, but, it's worth asking: what is a reader getting out of this page that they won't already know?

srowen · 2019-12-18T16:12:15Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+### Parameters
+
+<dl>
+  <dt><code><em>USING DATASOURCE</em></code></dt>


This makes it sound like DATASOURCE is a keyword. Don't you want to write something like USING data_source above? and match it below?

srowen · 2019-12-18T20:34:01Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>USING DATASOURCE</em></code></dt>
+  <dd>Data Source is the file format used to create the table. Data Source can be CSV, TXT, ORC, JDBC, PARQUET, etc. which is an implementation of DataSourceRegister in spark.</dd>


Let's list all the possible valid values at the moment, or link to them somehow. I don't think the implementation detail in the last clause is important.

srowen · 2019-12-18T20:34:30Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>CLUSTERED BY</em></code></dt>
+  <dd>Partitions are created on the table will be bucketed into fixed buckets based on the column specified for bucketing.</dd>


remove 'are'. Can we provide any links to what bucketing means?

srowen · 2019-12-18T20:35:25Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>TBLPROPERTIES</em></code></dt>
+  <dd>Table properties that has to be set are specified.</dd>


This is awkwardly worded. "Sets key-value properties on the table, such as ..."

srowen · 2019-12-18T20:35:58Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>LOCATION</em></code></dt>
+  <dd>Specified Location is used to store table data.</dd>


Can we say a little more -- it's a path to a directory, right?

srowen · 2019-12-18T20:36:44Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+---
+### Description
+
+The `CREATE TABLE` statement creates a new table using Hive format.


What does Hive format mean here? (for the reader)

maropu · 2019-12-20T06:46:25Z

docs/sql-ref-syntax-ddl-create-table.md

+### Description
+`CREATE TABLE` statement is used to create a table in an exsisting database. 
+
+The INSERT statements:


not INSERT but CREATE?

maropu · 2019-12-20T06:47:33Z

docs/sql-ref-syntax-ddl-create-table.md

@@ -19,4 +19,9 @@ license: |
  limitations under the License.
 ---

-**This page is under construction**
+### Description
+`CREATE TABLE` statement is used to create a table in an exsisting database. 


How about create -> define to avoid doubly saying create...

maropu · 2019-12-20T06:52:33Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+  USING DATASOURCE
+  [OPTIONS (key1=val1, key2=val2, ...)]
+  [PARTITIONED BY (col_name1, col_name2, ...)]
+  [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]


Adds SORTED BY:

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Line 289 in 18e8d1d

(SORTED BY orderedIdentifierList)?

maropu · 2019-12-20T06:56:58Z

docs/sql-ref-syntax-ddl-create-table.md

+
+The INSERT statements:
+* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)


We need to add CREATE TABLE LIKE, too?

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Line 129 in 18e8d1d

| CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier

maropu · 2019-12-20T06:59:08Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
+  [(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
+  [COMMENT table_comment]
+  [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]


Add the other PARTITIONED BY case:

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Line 121 in 18e8d1d

PARTITIONED BY partitionColumnNames=identifierList) |

maropu · 2019-12-20T06:59:49Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+  [(col_name1[:] col_type1 [COMMENT col_comment1], ...)]
+  [COMMENT table_comment]
+  [PARTITIONED BY (col_name2[:] col_type2 [COMMENT col_comment2], ...)]
+  [ROW FORMAT row_format]


SKEWED BY:

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Line 123 in 18e8d1d

skewSpec |

@maropu I think it is not supported.I checked SparkSqlParser class and found this.

srowen · 2020-01-04T16:36:23Z

Ping @PavithraRamachandran

maropu · 2020-01-08T00:54:01Z

kindly ping again, @PavithraRamachandran

PavithraRamachandran · 2020-01-08T15:24:15Z

I shall update the PR , with the necessary corrections as per the review comments.

gatorsmile · 2020-01-12T18:27:11Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+### Examples
+{% highlight sql %}
+
+CREATE TABLE Student (Id INT,name STRING)


This is not to create a Hive serde table since Spark 3.0. See #26736

maropu · 2020-01-14T23:53:59Z

ok to test

SparkQA · 2020-01-15T00:07:28Z

Test build #116731 has finished for PR 26759 at commit 62502a2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-01-15T14:48:30Z

@PavithraRamachandran there are still unresolved comments from a month ago. Please address all of them.

maropu · 2020-01-15T15:00:17Z

@PavithraRamachandran If you don't have enogh time to keep this, I can take this over.

PavithraRamachandran · 2020-01-16T04:31:00Z

i shall complete today

SparkQA · 2020-01-16T05:56:28Z

Test build #116818 has finished for PR 26759 at commit 50996f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-01-16T07:33:59Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+{% highlight sql %}
+
+--Using data source
+CREATE TABLE Student (width INT, length INT, height INT) USING CSV


perhaps change the column names to id, name, age to be more meaningful ? Also can you please put semi colon at the end in the examples just to be consistent with other docs ?

cc @huaxingao can you please check on the consistency part if you have some time ?

dilipbiswal · 2020-01-16T07:39:16Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>USING data_source</em></code></dt>
+  <dd>Data Source is the file format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.</dd>


should we say "input format" instead of "file format". For example, JDBC is data source is not a file format, right ?

dilipbiswal · 2020-01-16T07:42:00Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+</dl>
+
+<dl>
+  <dt><code><em>STORED</em></code></dt>


STORED AS ?

SparkQA · 2020-01-16T09:02:36Z

Test build #116836 has finished for PR 26759 at commit f835058.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-16T10:38:51Z

Test build #116848 has finished for PR 26759 at commit b7dab5d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-16T16:00:38Z

Test build #116860 has finished for PR 26759 at commit c545e9a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-01-16T23:59:08Z

docs/sql-ref-syntax-ddl-create-table.md

+The CREATE statements:
+* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
+* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-hiveformat.html)


sql-ref-syntax-ddl-create-table-like.html

maropu · 2020-01-17T00:08:57Z

docs/sql-ref-syntax-ddl-create-table-like.md

+
+### Syntax
+{% highlight sql %}
+CREATE TABLE [IF NOT EXISTS] [db_name.]new_table_name LIKE [db_name.]source_table_name [LOCATION path]


More options here:

spark/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Line 131 in 3848999

| CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier

huaxingao · 2020-01-17T04:49:03Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+### Syntax
+{% highlight sql %}
+CREATE TABLE [IF NOT EXISTS] [db_name.]table_name


Could you please use table_identifier instead of [db_name.]table_name? Put the syntax of table_identifier in Parameters section. You can refer to any of the docs that has table_identifier.

huaxingao · 2020-01-17T04:49:14Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+  [LOCATION path]
+  [COMMENT table_comment]
+  [TBLPROPERTIES (key1=val1, key2=val2, ...)]
+  [AS select_statement]


I am trying to make all the docs follow the same convention: put a space in between the symbols (e.g. '|', '=', '[]') and text. Refer to sql-ref-syntax-ddl-create-database as an example.

huaxingao · 2020-01-17T04:49:34Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+  USING CSV
+  PARTITIONED BY (age)
+  CLUSTERED BY (Id) INTO 4 buckets
+


Could you please add ; in the end of all the sql statements in example sections?

huaxingao · 2020-01-17T04:49:43Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+  PARTITIONED BY (age)
+  CLUSTERED BY (Id) INTO 4 buckets
+
+{% endhighlight %}


Could you add a Related Statements section to link the related statements?

SparkQA · 2020-01-17T15:42:59Z

Test build #116953 has finished for PR 26759 at commit bc2aef8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-17T16:31:16Z

Test build #116955 has finished for PR 26759 at commit 2f26e55.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2020-01-17T17:03:35Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+### Related Statements
+* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
+* [CREATE TABLE LIKE](ssql-ref-syntax-ddl-create-table-like.html)


This link is broken. You have an extra s in ssql

huaxingao · 2020-01-17T17:03:42Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+
+### Related Statements
+* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+* [CREATE TABLE LIKE](ssql-ref-syntax-ddl-create-table-like.html)


broken link

huaxingao · 2020-01-17T17:03:53Z

docs/sql-ref-syntax-ddl-create-table.md

+The CREATE statements:
+* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
+* [CREATE TABLE LIKE](ssql-ref-syntax-ddl-create-table-like.html)


broken link

huaxingao · 2020-01-17T17:18:15Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+  USING data_source
+  [ OPTIONS ( key1=val1, key2=val2, ... ) ]
+  [ PARTITIONED BY ( col_name1, col_name2, ... ) ]
+  [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ]


This line is too long. You may want to break it to make it look better. It looks like this in my google chrome:

huaxingao · 2020-01-17T17:18:24Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
+  [ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
+  [ COMMENT table_comment ]
+  [ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... ) | ( col_name1, col_name2, ... ) ]


break the line

SparkQA · 2020-01-17T17:50:42Z

Test build #116961 has finished for PR 26759 at commit 7efd7f7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Worth another proofreading too

srowen · 2020-01-22T01:49:12Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+
+<dl>
+  <dt><code><em>ROW FORMAT</em></code></dt>
+  <dd>SERDE is used to specify a custom SerDe or the DELIMITED clause inorder to use the native SerDe.</dd>


inorder -> in order

srowen · 2020-01-22T01:49:57Z

docs/sql-ref-syntax-ddl-create-table-datasource.md

+
+<dl>
+  <dt><code><em>LOCATION</em></code></dt>
+  <dd>Path to the directory where table data is stored, could be filesystem, HDFS, etc.</dd>


Here and below, better as "... data is stored, which could be a path on distributed storage like HDFS, etc."

srowen · 2020-01-22T01:50:17Z

docs/sql-ref-syntax-ddl-create-table-hiveformat.md

+<dl>
+  <dt><code><em>TBLPROPERTIES</em></code></dt>
+  <dd>
+	Table properties that has to be set are specified,such as `created.by.user`, `owner`, etc.


that have to be set
space after comma

SparkQA · 2020-01-22T15:57:40Z

Test build #117244 has finished for PR 26759 at commit 0ea1268.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Close enough, I think.

srowen · 2020-01-23T17:29:20Z

Merged to master

srowen requested changes Dec 4, 2019

View reviewed changes

dongjoon-hyun added DOCUMENTATION SQL labels Dec 5, 2019

PavithraRamachandran force-pushed the create_doc branch 2 times, most recently from 18be12a to 70cae22 Compare December 18, 2019 10:20

srowen reviewed Dec 18, 2019

View reviewed changes

maropu changed the title ~~[SPARK-28794] [DOC] Documentation for Create table Command~~ [SPARK-28794][SQL][DOC] Documentation for Create table Command Dec 20, 2019

maropu reviewed Dec 20, 2019

View reviewed changes

gatorsmile reviewed Jan 12, 2020

View reviewed changes

PavithraRamachandran force-pushed the create_doc branch from 70cae22 to 62502a2 Compare January 14, 2020 10:09

PavithraRamachandran force-pushed the create_doc branch from 62502a2 to 50996f2 Compare January 16, 2020 05:31

dilipbiswal reviewed Jan 16, 2020

View reviewed changes

docs/sql-ref-syntax-ddl-create-table-hiveformat.md Outdated

</dl>

<dl>

<dt><code><em>STORED</em></code></dt>

Copy link

Contributor

dilipbiswal Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STORED AS ?

PavithraRamachandran force-pushed the create_doc branch from f835058 to b7dab5d Compare January 16, 2020 10:15

maropu reviewed Jan 17, 2020

View reviewed changes

huaxingao reviewed Jan 17, 2020

View reviewed changes

create table document

bc2aef8

PavithraRamachandran force-pushed the create_doc branch from c545e9a to bc2aef8 Compare January 17, 2020 15:31

create table document

2f26e55

huaxingao reviewed Jan 17, 2020

View reviewed changes

create table document

7efd7f7

srowen requested changes Jan 22, 2020

View reviewed changes

create table document

0ea1268

srowen reviewed Jan 23, 2020

View reviewed changes

srowen closed this in afe70b3 Jan 23, 2020

[SPARK-28794][SQL][DOC] Documentation for Create table Command #26759

[SPARK-28794][SQL][DOC] Documentation for Create table Command #26759

Uh oh!

Conversation

PavithraRamachandran commented Dec 4, 2019 • edited by maropu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Dec 9, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PavithraRamachandran Jan 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen commented Jan 4, 2020

Uh oh!

maropu commented Jan 8, 2020

Uh oh!

PavithraRamachandran commented Jan 8, 2020

Uh oh!

gatorsmile Jan 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented Jan 14, 2020

Uh oh!

SparkQA commented Jan 15, 2020

Uh oh!

srowen commented Jan 15, 2020

Uh oh!

maropu commented Jan 15, 2020

Uh oh!

PavithraRamachandran commented Jan 16, 2020

Uh oh!

SparkQA commented Jan 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 16, 2020

Uh oh!

SparkQA commented Jan 16, 2020

Uh oh!

SparkQA commented Jan 16, 2020

Uh oh!

PavithraRamachandran commented Dec 4, 2019 •

edited by maropu

Loading

PavithraRamachandran Jan 16, 2020 •

edited

Loading

gatorsmile Jan 12, 2020 •

edited

Loading