Skip to content

Commit

Permalink
[Docs] Refactor the docs (#3544)
Browse files Browse the repository at this point in the history
Refactor the docs to:
* make it easy to access the documentation for each connector.
* add Kernel docs

Changes are staged at: https://docs.delta.io/0.0.2/index.html
  • Loading branch information
vkorukanti authored Aug 28, 2024
1 parent 44182b5 commit 103af9d
Show file tree
Hide file tree
Showing 45 changed files with 1,456 additions and 117 deletions.
3 changes: 2 additions & 1 deletion docs/source/best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,4 +186,5 @@ You should not use [Spark caching](optimizations/delta-cache.md#delta-and-rdd-ca

- The data that gets cached may not be updated if the table is accessed using a different identifier (for example, you do `spark.table(x).cache()` but then write to the table using `spark.write.save(/some/path)`.

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
5 changes: 3 additions & 2 deletions docs/source/bigquery-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
description: Learn how to read Delta Lake tables from Google BigQuery.
---

# Google BigQuery integration with <Delta>
# Google BigQuery connector

Google BigQuery supports reading <Delta> (reader version 3 with [Deletion Vectors](delta-deletion-vectors.md) and [Column Mapping](delta-column-mapping.md)). Please refer to [Delta Lake BigLake tables documentation](https://cloud.google.com/bigquery/docs/create-delta-lake-table) for more details.

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/concurrency-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,5 @@ This exception can occur in the following cases:
- When multiple writers are writing to an empty path at the same time.


.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
29 changes: 18 additions & 11 deletions docs/source/delta-apidoc.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,6 @@ However, there are some operations that are specific to <Delta> and you must use
- [Java API docs](api/java/spark/index.html)
- [Python API docs](api/python/spark/index.html)

## Delta Standalone
Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM library to read and write Delta tables. Unlike Delta-Spark, this library doesn't use Spark to read or write tables and it has only a few transitive dependencies. It can be used by any application that cannot use a Spark cluster. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/README.md).

- [Java API docs](api/java/standalone/index.html)

## Delta Flink
Flink/Delta Connector is a JVM library to read and write data from Apache Flink applications to Delta tables utilizing the Delta Standalone JVM library. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/flink/README.md).

- [Java API docs](api/java/flink/index.html)

## Delta Kernel

Delta Kernel is a library for operating on Delta tables. Specifically, it provides simple and narrow APIs for reading and writing to Delta tables without the need to understand the [Delta protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md) details. You can use this library to do the following:
Expand All @@ -36,4 +26,21 @@ More details refer [here](https://github.com/delta-io/delta/blob/branch-3.0/kern

- [Java API docs](api/java/kernel/index.html)

.. include:: /shared/replacements.md
## Delta Rust
This [library](https://docs.rs/deltalake/latest/deltalake/) allows Rust (with Python bindings) low level access to Delta tables and is intended to be used with data processing frameworks like `datafusion`, `ballista`, `rust-dataframe`, `vega`, etc.

## Delta Standalone

.. warning:: The Delta Standalone is deprecated in favor of [Delta Kernel](delta-kernel.md) which has support for reading from or writing into Delta tables with advanced features.

Delta Standalone, formerly known as the Delta Standalone Reader (DSR), is a JVM library to read and write Delta tables. Unlike Delta-Spark, this library doesn't use Spark to read or write tables and it has only a few transitive dependencies. It can be used by any application that cannot use a Spark cluster. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/README.md).

- [Java API docs](api/java/standalone/index.html)

## Delta Flink
Flink/Delta Connector is a JVM library to read and write data from Apache Flink applications to Delta tables utilizing the Delta Standalone JVM library. More details refer [here](https://github.com/delta-io/delta/blob/master/connectors/flink/README.md).

- [Java API docs](api/java/flink/index.html)

.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
8 changes: 8 additions & 0 deletions docs/source/delta-athena-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
description: Learn how to set up an integration to enable you to read Delta tables from AWS Athena.
---

# AWS Athena Delta Connector
Since Athena [version 3](https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html), Athena natively supports reading <Delta> tables. For details on using the native Delta Lake connector, see [Querying Delta Lake tables](https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html). For Athena versions lower than [version 3](https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html), you can use the manifest-based approach detailed in [_](/presto-integration.md).

.. <Delta> replace:: Delta Lake
3 changes: 2 additions & 1 deletion docs/source/delta-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -1359,4 +1359,5 @@ For example, you can pass your storage credentails through DataFrame options:

You can find the details of the Hadoop file system configurations for your storage in [_](/delta-storage.md).

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
4 changes: 2 additions & 2 deletions docs/source/delta-change-data-feed.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,5 +247,5 @@ VACUUM if they are outside the specified retention period.
Change data is committed along with the <Delta> transaction, and will become available at the same time as
the new data is available in the table.

.. include:: /shared/replacements.md

.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/delta-clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,5 @@ The following limitations exist:
- `DESCRIBE DETAIL` to inspect the current clustering columns
In <Delta> 3.2, the preview flag is removed and the above features are supported.

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/delta-column-mapping.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,5 @@ When column mapping is enabled for a Delta table, you can include spaces as well
- In <Delta> 3.0 and above, [Spark Structured Streaming](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html) reads require schema tracking to be enabled on a column mapping enabled table that underwent column renaming or column dropping. See [_](/delta-streaming.md#schema-tracking)
- The Delta table protocol specifies two modes of column mapping, by `name` and by `id`. <Delta> 2.1 and below do not support `id` mode.

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/delta-constraints.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,5 @@ You manage `CHECK` constraints using the `ALTER TABLE ADD CONSTRAINT` and `ALTER
> SHOW TBLPROPERTIES default.people10m;
```

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/delta-default-columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,5 @@ You can enable default column values for a table by setting `delta.feature.allow

- It is permissible, however, to assign or update default values for columns that were created in previous commands. For example, the following SQL command is valid: `ALTER TABLE t ALTER COLUMN c SET DEFAULT 16;`

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/delta-deletion-vectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,5 @@ REORG TABLE events
- `REORG TABLE` is _idempotent_, meaning that if it is run twice on the same dataset, the second run has no effect.
- After running `REORG TABLE`, the soft-deleted data may still exist in the old files. You can run [VACUUM](delta-utility.md#delta-vacuum) to physically delete the old files.

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
3 changes: 2 additions & 1 deletion docs/source/delta-drop-feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@ To drop the table feature, you must remove all transaction history associated wi

See [_](versioning.md).

.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
4 changes: 2 additions & 2 deletions docs/source/delta-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,5 +54,5 @@ Yes. When you use <Delta>, you are using open <AS> APIs so you can easily port y
Changing a column's type or dropping a column requires rewriting the table. For an example, see [Change column type](delta-batch.md#change-column-type).



.. include:: /shared/replacements.md
.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
6 changes: 5 additions & 1 deletion docs/source/delta-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,9 @@ Specifically, <Delta> offers:
- Schema enforcement: Automatically handles schema variations to prevent insertion of bad records during ingestion.
- [Time travel](delta-batch.md#deltatimetravel): Data versioning enables rollbacks, full historical audit trails, and reproducible machine learning experiments.
- [Upserts](delta-update.md#delta-merge) and [deletes](delta-update.md#delta-delete): Supports merge, update and delete operations to enable complex use cases like change-data-capture, slowly-changing-dimension (SCD) operations, streaming upserts, and so on.
- Vibrant connector ecosystem: <Delta> has connectors read and write Delta tables from various data processing engines like Apache Spark, Apache Flink, Apache Hive, Apache Trino, AWS Athena, and more.

.. include:: /shared/replacements.md
To get started follow the [quickstart guide](quick-start.md) to learn how to use <Delta> with Apache Spark.

.. <Delta> replace:: Delta Lake
.. <AS> replace:: Apache Spark
Loading

0 comments on commit 103af9d

Please sign in to comment.