[Docs] Refactor the docs (#3544)

Refactor the docs to: * make it easy to access the documentation for each connector. * add Kernel docs Changes are staged at: https://docs.delta.io/0.0.2/index.html
delta-io · Aug 28, 2024 · 103af9d · 103af9d
1 parent 44182b5
commit 103af9d
Show file tree

Hide file tree

Showing 45 changed files with 1,456 additions and 117 deletions.
diff --git a/docs/source/best-practices.md b/docs/source/best-practices.md
@@ -186,4 +186,5 @@ You should not use [Spark caching](optimizations/delta-cache.md#delta-and-rdd-ca
 
 - The data that gets cached may not be updated if the table is accessed using a different identifier (for example, you do `spark.table(x).cache()` but then write to the table using `spark.write.save(/some/path)`.
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/bigquery-integration.md b/docs/source/bigquery-integration.md
@@ -2,8 +2,9 @@
 description: Learn how to read Delta Lake tables from Google BigQuery.
 ---
 
-# Google BigQuery integration with <Delta>
+# Google BigQuery connector
 
 Google BigQuery supports reading <Delta> (reader version 3 with [Deletion Vectors](delta-deletion-vectors.md) and [Column Mapping](delta-column-mapping.md)). Please refer to [Delta Lake BigLake tables documentation](https://cloud.google.com/bigquery/docs/create-delta-lake-table) for more details.
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/concurrency-control.md b/docs/source/concurrency-control.md
@@ -115,4 +115,5 @@ This exception can occur in the following cases:
 - When multiple writers are writing to an empty path at the same time.
 
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-apidoc.md b/docs/source/delta-apidoc.md
@@ -16,16 +16,6 @@ However, there are some operations that are specific to <Delta> and you must use
 - [Java API docs](api/java/spark/index.html)
 - [Python API docs](api/python/spark/index.html)
 
-## Delta Standalone
-Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM library to read and write Delta tables. Unlike Delta-Spark, this library doesn't use Spark to read or write tables and it has only a few transitive dependencies. It can be used by any application that cannot use a Spark cluster. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/README.md).
-
-- [Java API docs](api/java/standalone/index.html)
-
-## Delta Flink
-Flink/Delta Connector is a JVM library to read and write data from Apache Flink applications to Delta tables utilizing the Delta Standalone JVM library. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/flink/README.md).
-
-- [Java API docs](api/java/flink/index.html)
-
 ## Delta Kernel
 
 Delta Kernel is a library for operating on Delta tables. Specifically, it provides simple and narrow APIs for reading and writing to Delta tables without the need to understand the [Delta protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md) details. You can use this library to do the following:
@@ -36,4 +26,21 @@ More details refer [here](https://github.com/delta-io/delta/blob/branch-3.0/kern
 
 - [Java API docs](api/java/kernel/index.html)
 
-.. include:: /shared/replacements.md
+## Delta Rust
+This [library](https://docs.rs/deltalake/latest/deltalake/) allows Rust (with Python bindings) low level access to Delta tables and is intended to be used with data processing frameworks like `datafusion`, `ballista`, `rust-dataframe`, `vega`, etc.
+
+## Delta Standalone
+
+.. warning:: The Delta Standalone is deprecated in favor of [Delta Kernel](delta-kernel.md) which has support for reading from or writing into Delta tables with advanced features.
+
+Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM library to read and write Delta tables. Unlike Delta-Spark, this library doesn't use Spark to read or write tables and it has only a few transitive dependencies. It can be used by any application that cannot use a Spark cluster. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/README.md).
+
+- [Java API docs](api/java/standalone/index.html)
+
+## Delta Flink
+Flink/Delta Connector is a JVM library to read and write data from Apache Flink applications to Delta tables utilizing the Delta Standalone JVM library. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/flink/README.md).
+
+- [Java API docs](api/java/flink/index.html)
+
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-athena-integration.md b/docs/source/delta-athena-integration.md
@@ -0,0 +1,8 @@
+---
+description: Learn how to set up an integration to enable you to read Delta tables from AWS Athena.
+---
+
+# AWS Athena Delta Connector
+Since Athena [version 3](https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html), Athena natively supports reading <Delta> tables. For details on using the native Delta Lake connector, see [Querying Delta Lake tables](https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html). For Athena versions lower than [version 3](https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html), you can use the manifest-based approach detailed in [_](/presto-integration.md).
+
+.. <Delta> replace:: Delta Lake
diff --git a/docs/source/delta-batch.md b/docs/source/delta-batch.md
@@ -1359,4 +1359,5 @@ For example, you can pass your storage credentails through DataFrame options:
 
 You can find the details of the Hadoop file system configurations for your storage in [_](/delta-storage.md).
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-change-data-feed.md b/docs/source/delta-change-data-feed.md
@@ -247,5 +247,5 @@ VACUUM if they are outside the specified retention period.
 Change data is committed along with the <Delta> transaction, and will become available at the same time as
 the new data is available in the table.
 
-.. include:: /shared/replacements.md
-
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-clustering.md b/docs/source/delta-clustering.md
@@ -135,4 +135,5 @@ The following limitations exist:
   - `DESCRIBE DETAIL` to inspect the current clustering columns
   In <Delta> 3.2, the preview flag is removed and the above features are supported.
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-column-mapping.md b/docs/source/delta-column-mapping.md
@@ -65,4 +65,5 @@ When column mapping is enabled for a Delta table, you can include spaces as well
 - In <Delta> 3.0 and above, [Spark Structured Streaming](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html) reads require schema tracking to be enabled on a column mapping enabled table that underwent column renaming or column dropping. See [_](/delta-streaming.md#schema-tracking)
 - The Delta table protocol specifies two modes of column mapping, by `name` and by `id`. <Delta> 2.1 and below do not support `id` mode.
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-constraints.md b/docs/source/delta-constraints.md
@@ -65,4 +65,5 @@ You manage `CHECK` constraints using the `ALTER TABLE ADD CONSTRAINT` and `ALTER
 > SHOW TBLPROPERTIES default.people10m;
 ```
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-default-columns.md b/docs/source/delta-default-columns.md
@@ -34,4 +34,5 @@ You can enable default column values for a table by setting `delta.feature.allow
 
 - It is permissible, however, to assign or update default values for columns that were created in previous commands. For example, the following SQL command is valid: `ALTER TABLE t ALTER COLUMN c SET DEFAULT 16;`
 
- .. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-deletion-vectors.md b/docs/source/delta-deletion-vectors.md
@@ -67,4 +67,5 @@ REORG TABLE events
   - `REORG TABLE` is _idempotent_, meaning that if it is run twice on the same dataset, the second run has no effect.
   - After running `REORG TABLE`, the soft-deleted data may still exist in the old files. You can run [VACUUM](delta-utility.md#delta-vacuum) to physically delete the old files.
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-drop-feature.md b/docs/source/delta-drop-feature.md
@@ -61,4 +61,5 @@ To drop the table feature, you must remove all transaction history associated wi
 
 See [_](versioning.md).
 
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-faq.md b/docs/source/delta-faq.md
@@ -54,5 +54,5 @@ Yes. When you use <Delta>, you are using open <AS> APIs so you can easily port y
 Changing a column's type or dropping a column requires rewriting the table. For an example, see [Change column type](delta-batch.md#change-column-type).
 
 
-
-.. include:: /shared/replacements.md
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark
diff --git a/docs/source/delta-intro.md b/docs/source/delta-intro.md
@@ -17,5 +17,9 @@ Specifically, <Delta> offers:
 - Schema enforcement: Automatically handles schema variations to prevent insertion of bad records during ingestion.
 - [Time travel](delta-batch.md#deltatimetravel): Data versioning enables rollbacks, full historical audit trails, and reproducible machine learning experiments.
 - [Upserts](delta-update.md#delta-merge) and [deletes](delta-update.md#delta-delete): Supports merge, update and delete operations to enable complex use cases like change-data-capture, slowly-changing-dimension (SCD) operations, streaming upserts, and so on.
+- Vibrant connector ecosystem: <Delta> has connectors read and write Delta tables from various data processing engines like Apache Spark, Apache Flink, Apache Hive, Apache Trino, AWS Athena, and more.
 
-.. include:: /shared/replacements.md
+To get started follow the [quickstart guide](quick-start.md) to learn how to use <Delta> with Apache Spark.
+
+.. <Delta> replace:: Delta Lake
+.. <AS> replace:: Apache Spark