Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update external table related documents #614

Merged
merged 2 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions docs/en/v0.4/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,10 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>
| `ENABLE_VIRTUAL_HOST_STYLE` | If you use virtual hosting to address the bucket, set it to "true". | Optional |
| `SESSION_TOKEN` | Your temporary credential for connecting the AWS S3 service. | Optional |

### Time Index Column

When creating an external table using the `CREATE EXTERNAL TABLE` statement, you are required to use the `TIME INDEX` constraint to specify a time index column.

### Examples

You can create an external table without any columns definitions:
Expand All @@ -204,15 +208,23 @@ You can create an external table without any columns definitions:
CREATE EXTERNAL TABLE IF NOT EXISTS city WITH (location='/var/data/city.csv',format='csv');
```

In this example, since we have not explicitly defined the columns of the table, the `CREATE EXTERNAL TABLE` statement will infer the `Time Index` column according to the following rules:

1. If the `Time Index` column can be inferred from the file metadata, then that column will be used as the `Time Index` column.
2. If there is a column named `greptime_timestamp` (the type of this column must be `TIMESTAMP`, otherwise, an error will be thrown), then this column will be used as the `Time Index` column.
3. Otherwise, a column named `greptime_timestamp` will be automatically created as the `Time Index` column, and a `DEFAULT '1970-01-01 00:00:00+0000'` constraint will be added.

Or

```sql
CREATE EXTERNAL TABLE city (
host string,
ts int64,
ts timestamp,
cpu float64 default 0,
memory float64,
TIME INDEX (ts),
PRIMARY KEY(ts, host)
PRIMARY KEY(host)
) WITH (location='/var/data/city.csv', format='csv');
```

In this example, we explicitly defined the `ts` column as the `Time Index` column. If there is no suitable `Time Index` column in the file, you can also create a placeholder column and add a `DEFAULT <expr>` constraint.
45 changes: 27 additions & 18 deletions docs/en/v0.4/user-guide/query-external-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,26 @@ DESC TABLE taxi_zone_lookup;
```

```sql
+--------------+--------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+--------------+--------+------+------+---------+---------------+
| LocationID | Int64 | | YES | | FIELD |
| Borough | String | | YES | | FIELD |
| Zone | String | | YES | | FIELD |
| service_zone | String | | YES | | FIELD |
+--------------+--------+------+------+---------+---------------+
+--------------------+----------------------+------+------+--------------------------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+--------------------+----------------------+------+------+--------------------------+---------------+
| LocationID | Int64 | | YES | | FIELD |
| Borough | String | | YES | | FIELD |
| Zone | String | | YES | | FIELD |
| service_zone | String | | YES | | FIELD |
| greptime_timestamp | TimestampMillisecond | PRI | NO | 1970-01-01 00:00:00+0000 | TIMESTAMP |
+--------------------+----------------------+------+------+--------------------------+---------------+
4 rows in set (0.00 sec)
```

:::tip Note
Here, you may notice there is a `greptime_timestamp` column, which doesn't exist in the file. This is because when creating an external table, if we didn't specify a `TIME INDEX` column, the `greptime_timestamp` column is automatically added as the `TIME INDEX` column with a default value of `1970-01-01 00:00:00+0000`. You can find more details in the [create](../reference/sql/create.md#create-external-table) document.
:::

Now you can query on the external table:

```sql
SELECT "Zone","Borough" FROM taxi_zone_lookup LIMIT 5;
SELECT `Zone`, `Borough` FROM taxi_zone_lookup LIMIT 5;
```

```sql
Expand Down Expand Up @@ -97,14 +102,18 @@ SELECT * FROM yellow_tripdata LIMIT 5;
```

```sql
+----------+--------------------------+--------------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee |
+----------+--------------------------+--------------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
| 1 | 2022-01-01 09:35:40+0900 | 2022-01-01 09:53:29+0900 | 2 | 3.8 | 1 | N | 142 | 236 | 1 | 14.5 | 3 | 0.5 | 3.65 | 0 | 0.3 | 21.95 | 2.5 | 0 |
| 1 | 2022-01-01 09:33:43+0900 | 2022-01-01 09:42:07+0900 | 1 | 2.1 | 1 | N | 236 | 42 | 1 | 8 | 0.5 | 0.5 | 4 | 0 | 0.3 | 13.3 | 0 | 0 |
| 2 | 2022-01-01 09:53:21+0900 | 2022-01-01 10:02:19+0900 | 1 | 0.97 | 1 | N | 166 | 166 | 1 | 7.5 | 0.5 | 0.5 | 1.76 | 0 | 0.3 | 10.56 | 0 | 0 |
| 2 | 2022-01-01 09:25:21+0900 | 2022-01-01 09:35:23+0900 | 1 | 1.09 | 1 | N | 114 | 68 | 2 | 8 | 0.5 | 0.5 | 0 | 0 | 0.3 | 11.8 | 2.5 | 0 |
| 2 | 2022-01-01 09:36:48+0900 | 2022-01-01 10:14:20+0900 | 1 | 4.3 | 1 | N | 68 | 163 | 1 | 23.5 | 0.5 | 0.5 | 3 | 0 | 0.3 | 30.3 | 2.5 | 0 |
+----------+--------------------------+--------------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+---------------------+
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | airport_fee | greptime_timestamp |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+---------------------+
| 1 | 2022-02-01 00:06:58 | 2022-02-01 00:19:24 | 1 | 5.4 | 1 | N | 138 | 252 | 1 | 17 | 1.75 | 0.5 | 3.9 | 0 | 0.3 | 23.45 | 0 | 1.25 | 1970-01-01 00:00:00 |
| 1 | 2022-02-01 00:38:22 | 2022-02-01 00:55:55 | 1 | 6.4 | 1 | N | 138 | 41 | 2 | 21 | 1.75 | 0.5 | 0 | 6.55 | 0.3 | 30.1 | 0 | 1.25 | 1970-01-01 00:00:00 |
| 1 | 2022-02-01 00:03:20 | 2022-02-01 00:26:59 | 1 | 12.5 | 1 | N | 138 | 200 | 2 | 35.5 | 1.75 | 0.5 | 0 | 6.55 | 0.3 | 44.6 | 0 | 1.25 | 1970-01-01 00:00:00 |
| 2 | 2022-02-01 00:08:00 | 2022-02-01 00:28:05 | 1 | 9.88 | 1 | N | 239 | 200 | 2 | 28 | 0.5 | 0.5 | 0 | 3 | 0.3 | 34.8 | 2.5 | 0 | 1970-01-01 00:00:00 |
| 2 | 2022-02-01 00:06:48 | 2022-02-01 00:33:07 | 1 | 12.16 | 1 | N | 138 | 125 | 1 | 35.5 | 0.5 | 0.5 | 8.11 | 0 | 0.3 | 48.66 | 2.5 | 1.25 | 1970-01-01 00:00:00 |
+----------+----------------------+-----------------------+-----------------+---------------+------------+--------------------+--------------+--------------+--------------+-------------+-------+---------+------------+--------------+-----------------------+--------------+----------------------+-------------+---------------------+
5 rows in set (0.11 sec)
```

:::tip Note
The query result includes the value of the `greptime_timestamp` column, although it does not exist in the original file. All these column values are `1970-01-01 00:00:00+0000`, because when we create an external table, the `greptime_timestamp` column is automatically added with a default value of `1970-01-01 00:00:00+0000`. You can find more details in the [create](../reference/sql/create.md#create-external-table) document.
:::
16 changes: 14 additions & 2 deletions docs/zh/v0.4/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,10 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>
| `ENABLE_VIRTUAL_HOST_STYLE` | 如果你想要使用 virtual hosting 来定位 bucket,将其设置为 `true` | 可选 |
| `SESSION_TOKEN` | 用于连接 AWS S3 服务的临时凭证 | 可选 |

### 时间索引列

在利用 `CREATE EXTERNAL TABLE` 语句创建外部表时,要求使用 `TIME INDEX` 约束来指定一个时间索引列。

### 示例

你可以在创建表时不带有列定义:
Expand All @@ -207,15 +211,23 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [<database>.]<table_name>
CREATE EXTERNAL TABLE IF NOT EXISTS city WITH (location='/var/data/city.csv',format='csv');
```

在这个例子中,我们没有明确定义表的列,因此 `CREATE EXTERNAL TABLE` 语句会由下列规则推断出时间索引列:

1. 如果可以从文件元数据中推断出时间索引列,那么就用该列作为时间索引列。
2. 如果存在名为 `greptime_timestamp` 的列(该列的类型必须为 `TIMESTAMP`,否则将抛出错误),那么就用该列作为时间索引列。
3. 否则,将自动创建名为 `greptime_timestamp` 的列作为时间索引列,并添加 `DEFAULT '1970-01-01 00:00:00+0000'` 约束。

或者带有列定义:

```sql
CREATE EXTERNAL TABLE city (
host string,
ts int64,
ts timestamp,
cpu float64 default 0,
memory float64,
TIME INDEX (ts),
PRIMARY KEY(ts, host)
PRIMARY KEY(host)
) WITH (location='/var/data/city.csv', format='csv');
```

在这个例子中,我们明确定义了 `ts` 列作为时间索引列。如果在文件中没有适合的时间索引列,你也可以创建一个占位符列,并添加 `DEFAULT <expr>` 约束。
Loading
Loading