Skip to content

[SPARK-28794][SQL][DOC] Documentation for Create table Command #26759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions docs/sql-ref-syntax-ddl-create-table-datasource.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
layout: global
title: CREATE DATASOURCE TABLE
displayTitle: CREATE DATASOURCE TABLE
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---

### Description

The `CREATE TABLE` statement defines a new table using a Data Source.

### Syntax
{% highlight sql %}
CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
USING data_source
[ OPTIONS ( key1=val1, key2=val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name3, col_name4, ... )
[ SORTED BY ( col_name [ ASC | DESC ], ... ) ]
INTO num_buckets BUCKETS ]
[ LOCATION path ]
[ COMMENT table_comment ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
{% endhighlight %}

### Parameters

<dl>
<dt><code><em>table_identifier</em></code></dt>
<dd>
Specifies a table name, which may be optionally qualified with a database name.<br><br>
<b>Syntax:</b>
<code>
[ database_name. ] table_name
</code>
</dd>
</dl>
<dl>
<dt><code><em>USING data_source</em></code></dt>
<dd>Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.</dd>
</dl>

<dl>
<dt><code><em>PARTITIONED BY</em></code></dt>
<dd>Partitions are created on the table, based on the columns specified.</dd>
</dl>

<dl>
<dt><code><em>CLUSTERED BY</em></code></dt>
<dd>
Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing.<br><br>
<b>NOTE:</b>Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle.<br>
<dt><code><em>SORTED BY</em></code></dt>
<dd>Determines the order in which the data is stored in buckets. Default is Ascending order.</dd>
</dd>
</dl>

<dl>
<dt><code><em>LOCATION</em></code></dt>
<dd>Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc.</dd>
</dl>

<dl>
<dt><code><em>COMMENT</em></code></dt>
<dd>Table comments are added.</dd>
</dl>

<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
</dl>

<dl>
<dt><code><em>AS select_statement</em></code></dt>
<dd>The table is populated using the data from the select statement.</dd>
</dl>

### Examples
{% highlight sql %}

--Using data source
CREATE TABLE Student (Id INT,name STRING ,age INT) USING CSV;

--Using data from another table
CREATE TABLE StudentInfo
AS SELECT * FROM Student;

--Partitioned and bucketed
CREATE TABLE Student (Id INT,name STRING ,age INT)
USING CSV
PARTITIONED BY (age)
CLUSTERED BY (Id) INTO 4 buckets;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add ; in the end of all the sql statements in example sections?

{% endhighlight %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a Related Statements section to link the related statements?


### Related Statements
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)
122 changes: 122 additions & 0 deletions docs/sql-ref-syntax-ddl-create-table-hiveformat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
layout: global
title: CREATE HIVEFORMAT TABLE
displayTitle: CREATE HIVEFORMAT TABLE
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description

The `CREATE TABLE` statement defines a new table using Hive format.

### Syntax
{% highlight sql %}
CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]

{% endhighlight %}

### Parameters

<dl>
<dt><code><em>table_identifier</em></code></dt>
<dd>
Specifies a table name, which may be optionally qualified with a database name.<br><br>
<b>Syntax:</b>
<code>
[ database_name. ] table_name
</code>
</dd>
</dl>

<dl>
<dt><code><em>EXTERNAL</em></code></dt>
<dd>Table is defined using the path provided as LOCATION, does not use default location for this table.</dd>
</dl>

<dl>
<dt><code><em>PARTITIONED BY</em></code></dt>
<dd>Partitions are created on the table, based on the columns specified.</dd>
</dl>

<dl>
<dt><code><em>ROW FORMAT</em></code></dt>
<dd>SERDE is used to specify a custom SerDe or the DELIMITED clause in order to use the native SerDe.</dd>
</dl>

<dl>
<dt><code><em>STORED AS</em></code></dt>
<dd>File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.</dd>
</dl>

<dl>
<dt><code><em>LOCATION</em></code></dt>
<dd>Path to the directory where table data is stored, Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc.</dd>
</dl>

<dl>
<dt><code><em>COMMENT</em></code></dt>
<dd>Table comments are added.</dd>
</dl>

<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>
Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
</dl>

<dl>
<dt><code><em>AS select_statement</em></code></dt>
<dd>The table is populated using the data from the select statement.</dd>
</dl>


### Examples
{% highlight sql %}

--Using Comment and loading data from another table into the created table
CREATE TABLE StudentInfo
COMMENT 'Table is created using existing data'
AS SELECT * FROM Student;

--Partitioned table
CREATE TABLE Student (Id INT,name STRING)
PARTITIONED BY (age INT)
TBLPROPERTIES ('owner'='xxxx');

CREATE TABLE Student (Id INT,name STRING,age INT)
PARTITIONED BY (name,age);

--Using Row Format and file format
CREATE TABLE Student (Id INT,name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

{% endhighlight %}


### Related Statements
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)
97 changes: 97 additions & 0 deletions docs/sql-ref-syntax-ddl-create-table-like.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
layout: global
title: CREATE TABLE LIKE
displayTitle: CREATE TABLE LIKE
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description

The `CREATE TABLE` statement defines a new table using the definition/metadata of an existing table or view.

### Syntax
{% highlight sql %}
CREATE TABLE [IF NOT EXISTS] table_identifier LIKE source_table_identifier
USING data_source
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ LOCATION path ]
{% endhighlight %}

### Parameters
<dl>
<dt><code><em>table_identifier</em></code></dt>
<dd>
Specifies a table name, which may be optionally qualified with a database name.<br><br>
<b>Syntax:</b> [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
<code>
[ database_name. ] table_name
</code>
</dd>
</dl>

<dl>
<dt><code><em>USING data_source</em></code></dt>
<dd>Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.</dd>
</dl>

<dl>
<dt><code><em>ROW FORMAT</em></code></dt>
<dd>SERDE is used to specify a custom SerDe or the DELIMITED clause in order to use the native SerDe.</dd>
</dl>

<dl>
<dt><code><em>STORED AS</em></code></dt>
<dd>File format for table storage, could be TEXTFILE, ORC, PARQUET,etc.</dd>
</dl>

<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
</dl>

<dl>
<dt><code><em>LOCATION</em></code></dt>
<dd>Path to the directory where table data is stored,Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc. Location to create an external table.</dd>
</dl>


### Examples
{% highlight sql %}

--Create table using an exsisting table
CREATE TABLE Student_Dupli like Student;

--Create table like using a data source
CREATE TABLE Student_Dupli like Student USING CSV;

--Table is created as external table at the location specified
CREATE TABLE Student_Dupli like Student location '/root1/home';

--Create table like using a rowformat
CREATE TABLE Student_Dupli like Student
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
TBLPROPERTIES ('owner'='xxxx');

{% endhighlight %}

### Related Statements
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)

12 changes: 11 additions & 1 deletion docs/sql-ref-syntax-ddl-create-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,14 @@ license: |
limitations under the License.
---

**This page is under construction**
### Description
`CREATE TABLE` statement is used to define a table in an exsisting database.

The CREATE statements:
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add CREATE TABLE LIKE, too?

| CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier

* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)

### Related Statements
- [ALTER TABLE](sql-ref-syntax-ddl-alter-table.html)
- [DROP TABLE](sql-ref-syntax-ddl-drop-table.html)