-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Docs: add apache amoro(incubating) with iceberg (#11965) #11966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
--- | ||
title: "Apache Amoro" | ||
--- | ||
<!-- | ||
- Licensed to the Apache Software Foundation (ASF) under one or more | ||
- contributor license agreements. See the NOTICE file distributed with | ||
- this work for additional information regarding copyright ownership. | ||
- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
- (the "License"); you may not use this file except in compliance with | ||
- the License. You may obtain a copy of the License at | ||
- | ||
- http://www.apache.org/licenses/LICENSE-2.0 | ||
- | ||
- Unless required by applicable law or agreed to in writing, software | ||
- distributed under the License is distributed on an "AS IS" BASIS, | ||
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
- See the License for the specific language governing permissions and | ||
- limitations under the License. | ||
--> | ||
|
||
# Apache Amoro With Iceberg | ||
|
||
**[Apache Amoro(incubating)](https://amoro.apache.org)** is a Lakehouse management system built on open data lake formats. Working with compute engines including Flink, Spark, and Trino, Amoro brings pluggable and | ||
**[Table Maintenance](https://amoro.apache.org/docs/latest/self-optimizing/)** features for a Lakehouse to provide out-of-the-box data warehouse experience, and helps data platforms or products easily build infra-decoupled, stream-and-batch-fused and lake-native architecture. | ||
**[AMS](https://amoro.apache.org/docs/latest/#architecture)(Amoro Management Service)** provides Lakehouse management features, like self-optimizing, data expiration, etc. It also provides a unified catalog service for all compute engines, which can also be combined with existing metadata services like HMS(Hive Metastore). | ||
|
||
## Auto Self-optimizing | ||
|
||
Amoro has introduced a Self-optimizing mechanism to | ||
create an out-of-the-box Streaming Lakehouse management service that is as user-friendly as a traditional database or data warehouse. Self-optimizing involves various procedures such as file compaction, deduplication, and sorting. | ||
|
||
The architecture and working mechanism of Self-optimizing are shown in the figure below: | ||
|
||
 | ||
|
||
The Optimizer is a component responsible for executing Self-optimizing tasks. It is a resident process managed by [AMS](https://amoro.apache.org/docs/latest/#architecture). AMS is responsible for | ||
czy006 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
detecting and planning Self-optimizing tasks for tables, and then scheduling them to Optimizers for distributed execution in real-time. Finally, AMS | ||
is responsible for submitting the optimizing results. Amoro achieves physical isolation of Optimizers through the Optimizer Group. | ||
|
||
The core features of [Amoro Self Optimizing](https://amoro.apache.org/docs/latest/self-optimizing/) are: | ||
|
||
- Automated, Asynchronous and Transparent — Continuous background detecting of file changes, asynchronous distributed execution of optimizing tasks, | ||
transparent and imperceptible to users | ||
- Resource Isolation and Sharing — Allow resources to be isolated and shared at the table level, as well as setting resource quotas | ||
- Flexible and Scalable Deployment — Optimizers support various deployment methods and convenient scaling | ||
|
||
## Table Format | ||
|
||
Apache Amoro supports all catalog types supported by Iceberg, including common catalog: [REST](https://iceberg.apache.org/concepts/catalog/#decoupling-using-the-rest-catalog), Hadoop, Hive, Glue, JDBC, Nessie and other third-party catalog. | ||
Amoro supports all storage types supported by Iceberg, including common store: Hadoop, S3, GCS, ECS, OSS, and so on. | ||
|
||
At the same time, we also provide a unique form based on Apache Iceberg, including mixed-Iceberg Format and mixed-Hive Format, so that you can quickly upgrade to the iceberg+hive Mixed table while compatible with the original Hive data | ||
|
||
### Iceberg Format | ||
|
||
Starting from Apache Amoro v0.4, Iceberg format including v1 and v2 is supported. Users only need to register Iceberg’s catalog in Amoro to host the table for Amoro maintenance. Amoro maintains the performance and economic availability of Iceberg tables with minimal read/write costs through means such as small file merging, eq-delete file conversion to pos-delete files, | ||
duplicate data elimination, and file cleaning, and Amoro has no intrusive impact on the functionality of Iceberg. | ||
|
||
### Mixed-Iceberg Format | ||
|
||
[Mixed-Iceberg Format](https://amoro.apache.org/docs/latest/mixed-iceberg-format/) is similar to that of clustered indexes in databases. Each TableStore can use different table formats. Mixed-Iceberg format provides high freshness OLAP through merge-on-read between BaseStore and ChangeStore. To provide high-performance merge-on-read, BaseStore and ChangeStore use completely consistent partition and layout, and both support auto-bucket. | ||
|
||
- BaseStore — stores the stock data of the table, usually generated by batch computing or optimizing processes, and is more friendly to ReadStore for reading. | ||
- ChangeStore — stores the flow and change data of the table, usually written in real-time by streaming computing, and can also be used for downstream CDC consumption, and is more friendly to WriteStore for writing. | ||
- LogStore — serves as a cache layer for ChangeStore to accelerate stream processing. Amoro manages the consistency between LogStore and ChangeStore. | ||
|
||
### Mixed-Hive Format | ||
|
||
[Mixed-Hive](https://amoro.apache.org/docs/latest/mixed-hive-format/) format is a format that has better compatibility with Hive than Mixed-Iceberg format. Mixed-Hive format uses a Hive table as the BaseStore and an Iceberg table as the ChangeStore. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.