Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1135] improvement(docs): Add docs about tables advanced feature like partitioning #1203

Merged
merged 22 commits into from
Jan 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3049470
Add docs about tables advanced feature like partitioning
yuqi1129 Dec 19, 2023
1ac2270
Add docs about tables advanced feature like partitioning
yuqi1129 Dec 19, 2023
31677a9
Resolve discussion
yuqi1129 Dec 19, 2023
164ddf0
Resolve discussion
yuqi1129 Dec 19, 2023
bfd2802
Resolve discussion again
yuqi1129 Dec 19, 2023
af0b348
Update doc again
yuqi1129 Dec 19, 2023
d4c086f
Polish docs
yuqi1129 Dec 21, 2023
41582dd
Resolve discussion again
yuqi1129 Dec 25, 2023
a08a184
Remove the source type and result type column
yuqi1129 Dec 25, 2023
ae6b3c3
Merge branch 'main' of github.com:datastrato/graviton into issue_1135
yuqi1129 Dec 25, 2023
31ddcd4
Add description about default null ordering value
yuqi1129 Dec 25, 2023
b70b394
Use a separate doc to describe partitioning, bucketing and sorted table
yuqi1129 Dec 25, 2023
6e37e14
Add document header for table-partitioning-bucketing-sort-order.md
yuqi1129 Dec 25, 2023
3f6c622
Add descriptions about default value of sort direction.
yuqi1129 Dec 25, 2023
993fdff
Change some improper variants naming
yuqi1129 Dec 25, 2023
b1d3db6
Fix discussion again
yuqi1129 Dec 25, 2023
108117a
Optimize code.
yuqi1129 Dec 27, 2023
c0503f8
Fix Jerry's comments and format some code
yuqi1129 Jan 2, 2024
b993c01
Polish docs again
yuqi1129 Jan 2, 2024
a266e95
1. Add the necessary messages needed by table partitioning
yuqi1129 Jan 2, 2024
cc5c454
Change to use api method
yuqi1129 Jan 2, 2024
983dbab
Update table-partitioning-bucketing-sort-order.md
jerryshao Jan 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions docs/advanced-table-feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#### Partitioned table

Currently, Gravitino supports the following partitioning strategies:

| Partitioning strategy | Json | Java | SQL syntax | Description |
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved
|-----------------------|-----------------------------------------------------|--------------------------------|----------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| Identity | `{"strategy":"identity","fieldName":["score"]}` | `Transforms.identity("score")` | `PARTITION BY score` | Partition by a field or reference |
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved
| function | `{"strategy":"functionName","fieldName":["score"]}` | `Transforms.hour("score")` | `PARTITION BY hour(score)` | Partition by a function, currently, we support currently function, hour, year, day, bucket, month, truncate, list and range |
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved

The detail of function strategies is as follows:
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved

| Function strategy | Json | Java | SQL syntax | Description |
|-------------------|------------------------------------------------------------------|------------------------------------------------|------------------------------------|--------------------------------------------------------|
| Identity | `{"strategy":"identity","fieldName":["score"]}` | `Transforms.identity("score")` | `PARTITION BY score` | Partition by field `score` |
| Hour | `{"strategy":"hour","fieldName":["score"]}` | `Transforms.hour("score")` | `PARTITION BY hour(score)` | Partition by `hour` function in field `score` |
| Day | `{"strategy":"day","fieldName":["score"]}` | `Transforms.day("score")` | `PARTITION BY day(score)` | Partition by `day` function in field `score` |
| Month | `{"strategy":"month","fieldName":["score"]}` | `Transforms.month("score")` | `PARTITION BY month(score)` | Partition by `month` function in field `score` |
| Year | `{"strategy":"year","fieldName":["score"]}` | `Transforms.year("score")` | `PARTITION BY year(score)` | Partition by `year` function in field `score` |
| Bucket | `{"strategy":"bucket","numBuckets":10,"fieldNames":[["score"]]}` | `Transforms.bucket(10, "score")` | `PARTITION BY bucket(10, score)` | Partition by `bucket` function in field `score` |
| Truncate | `{"strategy":"truncate","width":20,"fieldName":["score"]}` | `Transforms.truncate(20, "score")` | `PARTITION BY truncate(20, score)` | Partition by `truncate` function in field `score` |
| List | `{"strategy":"list","fieldNames":[["dt"],["city"]]}` | `Transforms.list(new String[] {"dt", "city"})` | `PARTITION BY list(dt, city)` | Partition by `list` function in fields `dt` and `city` |
| Range | `{"strategy":"range","fieldName":["dt"]}` | `Transforms.range(20, "score")` | `PARTITION BY range(score)` | Partition by `range` function in field `score` |

Except the strategies above, you can use other functions strategies to partition the table, for example, the strategy can be `{"strategy":"functionName","fieldName":["score"]}`. The `functionName` can be any function name that you can use in SQL, for example, `{"strategy":"functionName","fieldName":["score"]}` is equivalent to `PARTITION BY functionName(score)` in SQL.
For complex function, please refer to `FunctionPartitioningDTO`.

#### Bucketed table
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved

- Strategy. It defines in which way we bucket the table.

| Bucket strategy | Json | Java | Description |
|-----------------|---------|------------------|--------------------------|
| HASH | `HASH` | `Strategy.HASH` | Bucket table using hash |
| RANGE | `RANGE` | `Strategy.RANGE` | Bucket table using range |
| EVEN | `EVEN` | `Strategy.EVEN` | Bucket table using |
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved

- Number. It defines how many buckets we use to bucket the table.
- Function arguments. It defines which field or function should be used to bucket the table. Please refer to Java class `FunctionArg` and `DistributionDTO`.

| Expression type | Json | Java | SQL syntax | Description |
|-----------------|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-----------------|--------------------------------|
| Field | `{"type":"field","fieldName":["score"]}` | `FieldReferenceDTO.of("score")` | `score` | field reference value `score` |
| Function | `{"type":"function","functionName":"hour","fieldName":["score"]}` | `new FuncExpressionDTO.Builder()<br/>.withFunctionName("hour")<br/>.withFunctionArgs("score").build()` | `hour(score)` | function value `hour(score)` |
| Constant | `{"type":"constant","value":10, "dataType": "integer"}` | `new LiteralDTO.Builder()<br/>.withValue("10")<br/>.withDataType(Types.IntegerType.get())<br/>.build()` | `10` | Integer constant `10` |


#### Sorted order table

To define a sorted order table, you should use the following three components to construct a valid sorted order table.

- Direction. It defines in which direction we sort the table.

| Direction | Json | Java | Description |
| ---------- | ------ | -------------------------- |-------------------------------------------|
| Ascending | `asc` | `SortDirection.ASCENDING` | Sorted by a field or a function ascending |
| Descending | `desc` | `SortDirection.DESCENDING` | Sorted by a field or a function ascending |

- Null ordering. It describes how to handle null value when ordering

| Null ordering | Json | Java | Description |
| --------------------------------- | ------------- | -------------------------- |-----------------------------------|
| Put null value in the first place | `nulls_first` | `NullOrdering.NULLS_FIRST` | Put null value in the first place |
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved
| Put null value int the last place | `nulls_last` | `NullOrdering.NULLS_LAST` | Put null value in the last place |

- Sort term. It shows which field or function should be used to sort the table, please see the `Argument type` in the bucketed table.
2 changes: 2 additions & 0 deletions docs/manage-metadata-using-gravitino.md
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,8 @@ In addition to the basic settings, Gravitino supports the following features:
| Bucketed table | Equal to `CLUSTERED BY` in Apache Hive, some engine may use different words to describe it. | [Distribution](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/distributions/Distribution.html) |
| Sorted order table | Equal to `SORTED BY` in Apache Hive, some engine may use different words to describe it. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) |

The detail doc about these three features is [here](advanced-table-feature.md).

:::tip
yuqi1129 marked this conversation as resolved.
Show resolved Hide resolved
**Not all catalogs may support those features.**. Please refer to the related document for more details.
:::
Expand Down