Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1135] improvement(docs): Add docs about tables advanced feature like partitioning #1203

Merged
merged 22 commits into from
Jan 2, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3049470
Add docs about tables advanced feature like partitioning
yuqi1129 Dec 19, 2023
1ac2270
Add docs about tables advanced feature like partitioning
yuqi1129 Dec 19, 2023
31677a9
Resolve discussion
yuqi1129 Dec 19, 2023
164ddf0
Resolve discussion
yuqi1129 Dec 19, 2023
bfd2802
Resolve discussion again
yuqi1129 Dec 19, 2023
af0b348
Update doc again
yuqi1129 Dec 19, 2023
d4c086f
Polish docs
yuqi1129 Dec 21, 2023
41582dd
Resolve discussion again
yuqi1129 Dec 25, 2023
a08a184
Remove the source type and result type column
yuqi1129 Dec 25, 2023
ae6b3c3
Merge branch 'main' of github.com:datastrato/graviton into issue_1135
yuqi1129 Dec 25, 2023
31ddcd4
Add description about default null ordering value
yuqi1129 Dec 25, 2023
b70b394
Use a separate doc to describe partitioning, bucketing and sorted table
yuqi1129 Dec 25, 2023
6e37e14
Add document header for table-partitioning-bucketing-sort-order.md
yuqi1129 Dec 25, 2023
3f6c622
Add descriptions about default value of sort direction.
yuqi1129 Dec 25, 2023
993fdff
Change some improper variants naming
yuqi1129 Dec 25, 2023
b1d3db6
Fix discussion again
yuqi1129 Dec 25, 2023
108117a
Optimize code.
yuqi1129 Dec 27, 2023
c0503f8
Fix Jerry's comments and format some code
yuqi1129 Jan 2, 2024
b993c01
Polish docs again
yuqi1129 Jan 2, 2024
a266e95
1. Add the necessary messages needed by table partitioning
yuqi1129 Jan 2, 2024
cc5c454
Change to use api method
yuqi1129 Jan 2, 2024
983dbab
Update table-partitioning-bucketing-sort-order.md
jerryshao Jan 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 6 additions & 134 deletions docs/manage-metadata-using-gravitino.md
Original file line number Diff line number Diff line change
Expand Up @@ -730,142 +730,14 @@ The following is the table property that Gravitino supports:

In addition to the basic settings, Gravitino supports the following features:

| Feature | Description | Java doc |
|---------------------|----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| Partitioned table | Equal to `PARTITION BY` in Apache Hive and other engine that support partitioning. | [Partition](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/dto/rel/partitions/Partitioning.html) |
| Bucketed table | Equal to `CLUSTERED BY` in Apache Hive, some engine may use different words to describe it. | [Distribution](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/distributions/Distribution.html) |
| Sorted order table | Equal to `SORTED BY` in Apache Hive, some engine may use different words to describe it. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) |
| Feature | Description | Java doc |
|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| Table partitioning | Equal to `PARTITION BY` in Apache Hive, It is a partitioning strategy that is used to split a table into parts based on partition keys. Some table engine may not support this feature | [Partition](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/dto/rel/partitions/Partitioning.html) |
| Table bucketing | Equal to `CLUSTERED BY` in Apache Hive, Bucketing a.k.a (Clustering) is a technique to split the data into more manageable files/parts, (By specifying the number of buckets to create). The value of the bucketing column will be hashed by a user-defined number into buckets. | [Distribution](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/distributions/Distribution.html) |
| Table sort ordering | Equal to `SORTED BY` in Apache Hive, sort ordering is a method to sort the data by specific ways such as by a column or a function and then store table data. it will highly improve the query performance under certain scenarios. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) |

:::tip
**Not all catalogs may support those features.**. Please refer to the related document for more details.
:::

The following is an example of creating a partitioned, bucketed table and sorted order table:

<Tabs>
<TabItem value="bash" label="Bash">

```bash
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
-H "Content-Type: application/json" -d '{
"name": "table",
"columns": [
{
"name": "id",
"type": "integer",
"nullable": true,
"comment": "Id of the user"
},
{
"name": "name",
"type": "varchar(2000)",
"nullable": true,
"comment": "Name of the user"
},
{
"name": "age",
"type": "short",
"nullable": true,
"comment": "Age of the user"
},
{
"name": "score",
"type": "double",
"nullable": true,
"comment": "Score of the user"
}
],
"comment": "Create a new Table",
"properties": {
"format": "ORC"
},
"partitioning": [
{
"strategy": "identity",
"fieldName": ["score"]
}
],
"distribution": {
"strategy": "hash",
"number": 4,
"funcArgs": [
{
"type": "field",
"fieldName": ["score"]
}
]
},
"sortOrders": [
{
"direction": "asc",
"nullOrder": "NULLS_LAST",
"sortTerm": {
"type": "field",
"fieldName": ["name"]
}
}
]
}' http://localhost:8090/api/metalakes/metalake/catalogs/catalog/schemas/schema/tables
```

</TabItem>
<TabItem value="java" label="Java">

```java
tableCatalog.createTable(
NameIdentifier.of("metalake", "hive_catalog", "schema", "table"),
new ColumnDTO[] {
ColumnDTO.builder()
.withComment("Id of the user")
.withName("id")
.withDataType(Types.IntegerType.get())
.withNullable(true)
.build(),
ColumnDTO.builder()
.withComment("Name of the user")
.withName("name")
.withDataType(Types.VarCharType.of(1000))
.withNullable(true)
.build(),
ColumnDTO.builder()
.withComment("Age of the user")
.withName("age")
.withDataType(Types.ShortType.get())
.withNullable(true)
.build(),

ColumnDTO.builder()
.withComment("Score of the user")
.withName("score")
.withDataType(Types.DoubleType.get())
.withNullable(true)
.build(),
},
"Create a new Table",
tablePropertiesMap,
new Transform[] {
// Partition by id
Transforms.identity("score")
},
// CLUSTERED BY id
new DistributionDTO.Builder()
.withStrategy(Strategy.HASH)
.withNumber(4)
.withArgs(FieldReferenceDTO.of("id"))
.build(),
// SORTED BY name asc
new SortOrderDTO[] {
new SortOrderDTO.Builder()
.withDirection(SortDirection.ASCENDING)
.withNullOrder(NullOrdering.NULLS_LAST)
.withSortTerm(FieldReferenceDTO.of("name"))
.build()
}
);
```

</TabItem>
</Tabs>
For more information, please see the related document on [partitioning, bucketing, and sorting](table-partitioning-bucketing-sort-order.md).

:::note
The code above is an example of creating a Hive table. For other catalogs, the code is similar, but the supported column type, table properties may be different. For more details, please refer to the related doc.
Expand Down
Loading