-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Remove unnecessary appendix in SSS doc, add missing links #1390
Conversation
site/docs/maintenance.md
Outdated
@@ -130,3 +130,25 @@ table.rewriteManifests() | |||
|
|||
See the [`RewriteManifestsAction` Javadoc](/javadoc/master/org/apache/iceberg/actions/RewriteManifestsAction.html) to see more configuration options. | |||
|
|||
## Appendix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this here? It seems like we should link to the API sections that cover this instead of duplicating the content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel there's a slight gap between content in Create a table
in java-api-quickstart page and here, but it would be OK to just link the section if we think end users can infer they need to call loadTable
instead. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about the API quickstart. Now that Spark SQL supports DDL operations, we should start in that section with loading a table. At the top, we can point to the "Getting started" page with SQL create table and the section on creating tables, but after that talk about how to load tables with the API.
And linking to the quickstart is good for this, since we want to point to the right docs, not copy information. Thanks!
@@ -76,31 +76,9 @@ Each micro-batch written to a table produces a new snapshot, which are tracked i | |||
|
|||
### Compacting data files | |||
|
|||
The amount of data written in a micro batch is typically small, which can cause the table metadata to track lots of small files. Compacting small files into larger files reduces the metadata needed by the table, and increases query efficiency. | |||
The amount of data written in a micro batch is typically small, which can cause the table metadata to track lots of small files. [Compacting small files into larger files](../maintenance#compact-data-files) reduces the metadata needed by the table, and increases query efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thank you!
@@ -41,6 +41,9 @@ import org.apache.iceberg.catalog.TableIdentifier; | |||
|
|||
TableIdentifier name = TableIdentifier.of("logging", "logs"); | |||
Table table = catalog.createTable(name, schema, spec); | |||
|
|||
// or to load an existing table, use the following line | |||
// Table table = catalog.loadTable(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a section for this instead of adding this block in multiple places. I'm happy to do that if you want to keep it separate from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I did this because having a new section for loading table would duplicate the code here. It'd be easier for us to have a different PR (or a PR against my fork) if you have better idea. Thanks!
Thanks, @HeartSaVioR! I'm merging this and will follow up with a PR to better explain how to load a table. |
Thanks for reviewing and merging! |
Appendix in SSS page was actually to deduplicate code in maintenance page, but it didn't move to maintenance page once we extract the content from SSS page.
I also add a link for compacting data files in SSS page.