Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(create): fix incorrect syntax and unify format #622

Merged
merged 3 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 34 additions & 33 deletions docs/en/v0.4/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,14 @@ Creates a new table in the `db` database or the current database in use:
```sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name
(
name1 [type1] [NULL|NOT NULL] [DEFAULT expr1] [TIME INDEX] [PRIMARY KEY] COMMENT comment,
name2 [type2] [NULL|NOT NULL] [DEFAULT expr2] [TIME INDEX] [PRIMARY KEY] COMMENT comment,
...,
[TIME INDEX (name)],
[PRIMARY KEY(name1, name2,...)]
) ENGINE = engine WITH([ttl | regions] = expr, ...)
column1 type1 [NULL | NOT NULL] [DEFAULT expr1] [TIME INDEX] [PRIMARY KEY] [COMMENT comment1],
column2 type2 [NULL | NOT NULL] [DEFAULT expr2] [TIME INDEX] [PRIMARY KEY] [COMMENT comment2],
...
[TIME INDEX (column)],
[PRIMARY KEY(column1, column2, ...)]
) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
[
PARTITION BY RANGE COLUMNS(name1, name2, ...) (
PARTITION BY RANGE COLUMNS(column1, column2, ...) (
PARTITION r0 VALUES LESS THAN (expr1),
PARTITION r1 VALUES LESS THAN (expr2),
...
Expand All @@ -61,7 +61,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
```

The table schema is specified by the brackets before the `ENGINE`. The table schema is a list of column definitions and table constraints.
A column definition includes the column `name`, `type`, and options such as nullable or default values, etc. Please see below.
A column definition includes the column `column_name`, `type`, and options such as nullable or default values, etc. Please see below.

### Table constraints

Expand Down Expand Up @@ -155,34 +155,35 @@ TODO by MichaelScofield
Creates a new file external table in the `db` database or the current database in use:

```sql
CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db.]table_name
[
(
<col_name> <col_type> [NULL | NOT NULL] [COMMENT "<comment>"]
)
]
[ WITH
(
LOCATION = 'url'
[,FORMAT = { csv | json | parquet } ]
[,PATTERN = '<regex_pattern>' ]
[,ENDPOINT = '<uri>' ]
[,ACCESS_KEY_ID = '<key_id>' ]
[,SECRET_ACCESS_KEY = '<access_key>' ]
[,SESSION_TOKEN = '<token>' ]
[,REGION = '<region>' ]
[,ENABLE_VIRTUAL_HOST_STYLE = '<boolean>']
..
column1 type1 [NULL | NOT NULL] [DEFAULT expr1] [TIME INDEX] [PRIMARY KEY] [COMMENT comment1],
column2 type2 [NULL | NOT NULL] [DEFAULT expr2] [TIME INDEX] [PRIMARY KEY] [COMMENT comment2],
...
[TIME INDEX (column)],
[PRIMARY KEY(column1, column2, ...)]
)
]
] WITH (
LOCATION = url,
FORMAT = { 'CSV' | 'JSON' | 'PARQUET' | 'ORC' }
[,PATTERN = regex_pattern ]
[,REGION = region ]
[,ENDPOINT = uri ]
[,ACCESS_KEY_ID = key_id ]
[,SECRET_ACCESS_KEY = access_key ]
[,ENABLE_VIRTUAL_HOST_STYLE = { TRUE | FALSE }]
[,SESSION_TOKEN = token ]
...
)
```

### Table options

| Option | Description | Required |
| ---------- | ------------------------------------------------------------------------------- | ------------ |
| `LOCATION` | External files locations, e.g., `s3://<bucket>[<path>]`, `/<path>/[<filename>]` | **Required** |
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet | **Required** |
| `FORMAT` | Target file(s) format, e.g., JSON, CSV, Parquet, ORC | **Required** |
| `PATTERN` | Use regex to match files. e.g., `*_today.parquet` | Optional |

#### S3
Expand All @@ -198,21 +199,21 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>

### Time Index Column

When creating an external table using the `CREATE EXTERNAL TABLE` statement, you are required to use the `TIME INDEX` constraint to specify a time index column.
When creating an external table using the `CREATE EXTERNAL TABLE` statement, you are required to use the `TIME INDEX` constraint to specify a Time Index column.

### Examples

You can create an external table without any columns definitions:
You can create an external table without columns definitions, the column definitions will be automatically inferred:

```sql
CREATE EXTERNAL TABLE IF NOT EXISTS city WITH (location='/var/data/city.csv',format='csv');
```

In this example, since we have not explicitly defined the columns of the table, the `CREATE EXTERNAL TABLE` statement will infer the `Time Index` column according to the following rules:
In this example, we did not explicitly define the columns of the table. To satisfy the requirement that the external table must specify a **Time Index** column, the `CREATE EXTERNAL TABLE` statement will infer the Time Index column according to the following rules:

1. If the `Time Index` column can be inferred from the file metadata, then that column will be used as the `Time Index` column.
2. If there is a column named `greptime_timestamp` (the type of this column must be `TIMESTAMP`, otherwise, an error will be thrown), then this column will be used as the `Time Index` column.
3. Otherwise, a column named `greptime_timestamp` will be automatically created as the `Time Index` column, and a `DEFAULT '1970-01-01 00:00:00+0000'` constraint will be added.
1. If the Time Index column can be inferred from the file metadata, then that column will be used as the Time Index column.
2. If there is a column named `greptime_timestamp` (the type of this column must be `TIMESTAMP`, otherwise, an error will be thrown), then this column will be used as the Time Index column.
3. Otherwise, a column named `greptime_timestamp` will be automatically created as the Time Index column, and a `DEFAULT '1970-01-01 00:00:00+0000'` constraint will be added.

Or

Expand All @@ -227,4 +228,4 @@ CREATE EXTERNAL TABLE city (
) WITH (location='/var/data/city.csv', format='csv');
```

In this example, we explicitly defined the `ts` column as the `Time Index` column. If there is no suitable `Time Index` column in the file, you can also create a placeholder column and add a `DEFAULT <expr>` constraint.
In this example, we explicitly defined the `ts` column as the Time Index column. If there is no suitable Time Index column in the file, you can also create a placeholder column and add a `DEFAULT expr` constraint.
2 changes: 1 addition & 1 deletion docs/en/v0.4/user-guide/query-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Query on a file

Currently, we support queries on `Parquet`, `CSV`, and `NDJson` format file(s).
Currently, we support queries on `Parquet`, `CSV`, `ORC`, and `NDJson` format file(s).

We use the [Taxi Zone Lookup Table](https://d37ci6vzurychx.cloudfront.net/misc/taxi+_zone_lookup.csv) data as an example.

Expand Down
57 changes: 29 additions & 28 deletions docs/zh/v0.4/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,14 @@ CREATE DATABASE IF NOT EXISTS test;
```sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name
(
name1 [type1] [NULL|NOT NULL] [DEFAULT expr1] [TIME INDEX] [PRIMARY KEY] COMMENT comment,
name2 [type2] [NULL|NOT NULL] [DEFAULT expr2] [TIME INDEX] [PRIMARY KEY] COMMENT comment,
...,
[TIME INDEX (name)],
[PRIMARY KEY(name1, name2,...)]
) ENGINE = engine WITH([ttl | regions] = expr, ...)
column1 type1 [NULL | NOT NULL] [DEFAULT expr1] [TIME INDEX] [PRIMARY KEY] [COMMENT comment1],
column2 type2 [NULL | NOT NULL] [DEFAULT expr2] [TIME INDEX] [PRIMARY KEY] [COMMENT comment2],
...
[TIME INDEX (column)],
[PRIMARY KEY(column1, column2, ...)]
) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
[
PARTITION BY RANGE COLUMNS(name1, name2, ...) (
PARTITION BY RANGE COLUMNS(column1, column2, ...) (
PARTITION r0 VALUES LESS THAN (expr1),
PARTITION r1 VALUES LESS THAN (expr2),
...
Expand Down Expand Up @@ -158,34 +158,35 @@ TODO by MichaelScofield
在 `db` 或当前数据库中创建新的文件外部表:

```sql
CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db.]table_name
[
(
<col_name> <col_type> [NULL | NOT NULL] [COMMENT "<comment>"]
)
]
[ WITH
(
LOCATION = 'url'
[,FORMAT = { csv | json | parquet } ]
[,PATTERN = '<regex_pattern>' ]
[,ENDPOINT = '<uri>' ]
[,ACCESS_KEY_ID = '<key_id>' ]
[,SECRET_ACCESS_KEY = '<access_key>' ]
[,SESSION_TOKEN = '<token>' ]
[,REGION = '<region>' ]
[,ENABLE_VIRTUAL_HOST_STYLE = '<boolean>']
..
column1 type1 [NULL | NOT NULL] [DEFAULT expr1] [TIME INDEX] [PRIMARY KEY] [COMMENT comment1],
column2 type2 [NULL | NOT NULL] [DEFAULT expr2] [TIME INDEX] [PRIMARY KEY] [COMMENT comment2],
...
[TIME INDEX (column)],
[PRIMARY KEY(column1, column2, ...)]
)
]
] WITH (
LOCATION = url,
FORMAT = { 'CSV' | 'JSON' | 'PARQUET' | 'ORC' }
[,PATTERN = regex_pattern ]
[,REGION = region ]
[,ENDPOINT = uri ]
[,ACCESS_KEY_ID = key_id ]
[,SECRET_ACCESS_KEY = access_key ]
[,ENABLE_VIRTUAL_HOST_STYLE = { TRUE | FALSE }]
[,SESSION_TOKEN = token ]
...
)
```

### 表选项

| 选项 | 描述 | 是否必需 |
| ---------- | ------------------------------------------------------------------ | -------- |
| `LOCATION` | 外部表的位置,例如 `s3://<bucket>[<path>]`, `/<path>/[<filename>]` | **是** |
| `FORMAT` | 目标文件的格式,例如 JSON,CSV,Parquet | **是** |
| `FORMAT` | 目标文件的格式,例如 JSON,CSV,Parquet, ORC | **是** |
| `PATTERN` | 使用正则来匹配文件,例如 `*_today.parquet` | 可选 |

#### S3
Expand All @@ -205,13 +206,13 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>

### 示例

你可以在创建表时不带有列定义
你可以在创建外部表时不带有列定义,列定义将会被自动推断

```sql
CREATE EXTERNAL TABLE IF NOT EXISTS city WITH (location='/var/data/city.csv',format='csv');
```

在这个例子中,我们没有明确定义表的列,因此 `CREATE EXTERNAL TABLE` 语句会由下列规则推断出时间索引列
在这个例子中,我们没有明确定义表的列,为满足外边表必须指定**时间索引列**的要求,`CREATE EXTERNAL TABLE` 语句会依据下述规则推断出时间索引列

1. 如果可以从文件元数据中推断出时间索引列,那么就用该列作为时间索引列。
2. 如果存在名为 `greptime_timestamp` 的列(该列的类型必须为 `TIMESTAMP`,否则将抛出错误),那么就用该列作为时间索引列。
Expand All @@ -230,4 +231,4 @@ CREATE EXTERNAL TABLE city (
) WITH (location='/var/data/city.csv', format='csv');
```

在这个例子中,我们明确定义了 `ts` 列作为时间索引列。如果在文件中没有适合的时间索引列,你也可以创建一个占位符列,并添加 `DEFAULT <expr>` 约束。
在这个例子中,我们明确定义了 `ts` 列作为时间索引列。如果在文件中没有适合的时间索引列,你也可以创建一个占位符列,并添加 `DEFAULT expr` 约束。
6 changes: 3 additions & 3 deletions docs/zh/v0.4/user-guide/query-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## 对文件进行查询

目前,我们支持 `Parquet`、`CSV` 和 `NDJson` 格式文件的查询。
目前,我们支持 `Parquet`、`CSV`、`ORC` 和 `NDJson` 格式文件的查询。

以 [Taxi Zone Lookup Table](https://d37ci6vzurychx.cloudfront.net/misc/taxi+_zone_lookup.csv) 数据为例。

Expand Down Expand Up @@ -36,7 +36,7 @@ DESC TABLE taxi_zone_lookup;
```

:::tip 注意
在这里,你可能会注意到出现了一个 `greptime_timestamp` 列,这个列作为表的时间索引列,在文件中并不存在。这是因为在创建外部表时,我们没有指定时间索引列,`greptime_timestamp` 列为被自动添加作为时间索引列,并且默认值为 `1970-01-01 00:00:00+0000`。你可以在 [create](../reference/sql/create.md#create-external-table) 文档中查找更多详情。
在这里,你可能会注意到出现了一个 `greptime_timestamp` 列,这个列作为表的时间索引列,在文件中并不存在。这是因为在创建外部表时,我们没有指定时间索引列,`greptime_timestamp` 列被自动添加作为时间索引列,并且默认值为 `1970-01-01 00:00:00+0000`。你可以在 [create](../reference/sql/create.md#create-external-table) 文档中查找更多详情。
:::

现在就可以查询外部表了:
Expand Down Expand Up @@ -115,5 +115,5 @@ SELECT * FROM yellow_tripdata LIMIT 5;
```

:::tip 注意
查询结果中包含 `greptime_timestamp` 列的值,尽管它在原始文件中并不存在。所有这个列的值均为 `1970-01-01 00:00:00+0000`,这是因为我们在创建外部表时,自动添加列 `greptime_timestamp`,并且默认值为 `1970-01-01 00:00:00+0000`。你可以在 [create](../reference/sql/create.md#create-external-table) 文档中查找更多详情。
查询结果中包含 `greptime_timestamp` 列的值,尽管它在原始文件中并不存在。这个列的所有值均为 `1970-01-01 00:00:00+0000`,这是因为我们在创建外部表时,自动添加列 `greptime_timestamp`,并且默认值为 `1970-01-01 00:00:00+0000`。你可以在 [create](../reference/sql/create.md#create-external-table) 文档中查找更多详情。
:::
Loading