Skip to content

Commit 0187c46

Browse files
authored
Merge pull request delta-io#1563 from wjones127/docs/python-api-reference
docs: add Python API reference to mkdocs
2 parents 30a3800 + 83efb17 commit 0187c46

12 files changed

+200
-108
lines changed

docs/python_api.md

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Python API Reference
2+
3+
## DeltaTable
4+
5+
::: deltalake.table
6+
7+
## Writing Delta Tables
8+
9+
::: deltalake.write_deltalake
10+
11+
## Delta Lake Schemas
12+
13+
Schemas, fields, and data types are provided in the ``deltalake.schema`` submodule.
14+
15+
::: deltalake.schema.Schema
16+
17+
::: deltalake.schema.PrimitiveType
18+
19+
::: deltalake.schema.ArrayType
20+
21+
::: deltalake.schema.MapType
22+
23+
::: deltalake.schema.Field
24+
25+
::: deltalake.schema.StructType
26+
27+
## Data Catalog
28+
29+
::: deltalake.data_catalog
30+
31+
## Delta Storage Handler
32+
33+
::: deltalake.fs

docs/requirements.txt

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
mkdocs
2+
mkdocstrings[python]
3+
mkdocs-autorefs

docs/usage/examining-table.md

+10-13
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The delta log maintains basic metadata about a table, including:
1414
to have data deleted from it.
1515

1616
Get metadata from a table with the
17-
`DeltaTable.metadata` method:
17+
[DeltaTable.metadata()][] method:
1818

1919
``` python
2020
>>> from deltalake import DeltaTable
@@ -27,12 +27,12 @@ Metadata(id: 5fba94ed-9794-4965-ba6e-6ee3c0d22af9, name: None, description: None
2727

2828
The schema for the table is also saved in the transaction log. It can
2929
either be retrieved in the Delta Lake form as
30-
`deltalake.schema.Schema` or as a
30+
[deltalake.schema.Schema][] or as a
3131
PyArrow schema. The first allows you to introspect any column-level
3232
metadata stored in the schema, while the latter represents the schema
3333
the table will be loaded into.
3434

35-
Use `DeltaTable.schema` to retrieve the delta lake schema:
35+
Use [DeltaTable.schema][] to retrieve the delta lake schema:
3636

3737
``` python
3838
>>> from deltalake import DeltaTable
@@ -43,14 +43,14 @@ Schema([Field(id, PrimitiveType("long"), nullable=True)])
4343

4444
These schemas have a JSON representation that can be retrieved. To
4545
reconstruct from json, use
46-
`deltalake.schema.Schema.from_json()`.
46+
[deltalake.schema.Schema.from_json()][].
4747

4848
``` python
4949
>>> dt.schema().json()
5050
'{"type":"struct","fields":[{"name":"id","type":"long","nullable":true,"metadata":{}}]}'
5151
```
5252

53-
Use `deltalake.schema.Schema.to_pyarrow()` to retrieve the PyArrow schema:
53+
Use [deltalake.schema.Schema.to_pyarrow()][] to retrieve the PyArrow schema:
5454

5555
``` python
5656
>>> dt.schema().to_pyarrow()
@@ -65,15 +65,12 @@ table, when, and by whom. This information is retained for 30 days by
6565
default, unless otherwise specified by the table configuration
6666
`delta.logRetentionDuration`.
6767

68-
::: note
69-
::: title
70-
Note
71-
:::
68+
!!! note
69+
70+
This information is not written by all writers and different writers may
71+
use different schemas to encode the actions. For Spark\'s format, see:
72+
<https://docs.delta.io/latest/delta-utility.html#history-schema>
7273

73-
This information is not written by all writers and different writers may
74-
use different schemas to encode the actions. For Spark\'s format, see:
75-
<https://docs.delta.io/latest/delta-utility.html#history-schema>
76-
:::
7774

7875
To view the available history, use `DeltaTable.history`:
7976

docs/usage/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Usage
22

3-
A `DeltaTable` represents the state of a
3+
A [DeltaTable][] represents the state of a
44
delta table at a particular version. This includes which files are
55
currently part of the table, the schema of the table, and other metadata
66
such as creation time.

docs/usage/loading-table.md

+5-9
Original file line numberDiff line numberDiff line change
@@ -109,12 +109,8 @@ version number or datetime string:
109109
>>> dt.load_with_datetime("2021-11-04 00:05:23.283+00:00")
110110
```
111111

112-
::: warning
113-
::: title
114-
Warning
115-
:::
116-
117-
Previous table versions may not exist if they have been vacuumed, in
118-
which case an exception will be thrown. See [Vacuuming
119-
tables](#vacuuming-tables) for more information.
120-
:::
112+
!!! warning
113+
114+
Previous table versions may not exist if they have been vacuumed, in
115+
which case an exception will be thrown. See [Vacuuming
116+
tables](#vacuuming-tables) for more information.

mkdocs.yml

+23-1
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,26 @@ nav:
1010
- Examining a Delta Table: usage/examining-table.md
1111
- Querying a Delta Table: usage/querying-delta-tables.md
1212
- Managing a Delta Table: usage/managing-tables.md
13-
- Writing Delta Tables: usage/writing-delta-tables.md
13+
- Writing Delta Tables: usage/writing-delta-tables.md
14+
- API Reference: python_api.md
15+
16+
plugins:
17+
- autorefs
18+
- mkdocstrings:
19+
handlers:
20+
python:
21+
path: [../python]
22+
rendering:
23+
heading_level: 4
24+
show_source: false
25+
show_symbol_type_in_heading: true
26+
show_signature_annotations: true
27+
show_root_heading: true
28+
members_order: source
29+
import:
30+
# for cross references
31+
- https://arrow.apache.org/docs/objects.inv
32+
- https://pandas.pydata.org/docs/objects.inv
33+
34+
markdown_extensions:
35+
- admonition

python/Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ check-rust: ## Run check on Rust
6666
.PHONY: check-python
6767
check-python: ## Run check on Python
6868
$(info Check Python black)
69-
black --check .
69+
black --check --diff .
7070
$(info Check Python ruff)
7171
ruff check .
7272
$(info Check Python mypy)

python/deltalake/_internal.pyi

+112-71
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import sys
2-
from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union
2+
from typing import Any, Dict, List, Mapping, Optional, Tuple, Union
33

44
if sys.version_info >= (3, 8):
55
from typing import Literal
@@ -13,24 +13,104 @@ from deltalake.writer import AddAction
1313

1414
__version__: str
1515

16-
RawDeltaTable: Any
17-
rust_core_version: Callable[[], str]
18-
19-
write_new_deltalake: Callable[
20-
[
21-
str,
22-
pa.Schema,
23-
List[AddAction],
24-
str,
25-
List[str],
26-
Optional[str],
27-
Optional[str],
28-
Optional[Mapping[str, Optional[str]]],
29-
Optional[Dict[str, str]],
30-
],
31-
None,
32-
]
16+
class RawDeltaTableMetaData:
17+
id: int
18+
name: str
19+
description: str
20+
partition_columns: List[str]
21+
created_time: int
22+
configuration: Dict[str, str]
23+
24+
class RawDeltaTable:
25+
schema: Any
26+
27+
def __init__(
28+
self,
29+
table_uri: str,
30+
version: Optional[int],
31+
storage_options: Optional[Dict[str, str]],
32+
without_files: bool,
33+
log_buffer_size: Optional[int],
34+
) -> None: ...
35+
@staticmethod
36+
def get_table_uri_from_data_catalog(
37+
data_catalog: str,
38+
database_name: str,
39+
table_name: str,
40+
data_catalog_id: Optional[str] = None,
41+
catalog_options: Optional[Dict[str, str]] = None,
42+
) -> str: ...
43+
def table_uri(self) -> str: ...
44+
def version(self) -> int: ...
45+
def metadata(self) -> RawDeltaTableMetaData: ...
46+
def protocol_versions(self) -> List[int]: ...
47+
def load_version(self, version: int) -> None: ...
48+
def load_with_datetime(self, ds: str) -> None: ...
49+
def files_by_partitions(
50+
self, partitions_filters: Optional[FilterType]
51+
) -> List[str]: ...
52+
def files(self, partition_filters: Optional[FilterType]) -> List[str]: ...
53+
def file_uris(self, partition_filters: Optional[FilterType]) -> List[str]: ...
54+
def vacuum(
55+
self,
56+
dry_run: bool,
57+
retention_hours: Optional[int],
58+
enforce_retention_duration: bool,
59+
) -> List[str]: ...
60+
def compact_optimize(
61+
self,
62+
partition_filters: Optional[FilterType],
63+
target_size: Optional[int],
64+
max_concurrent_tasks: Optional[int],
65+
min_commit_interval: Optional[int],
66+
) -> str: ...
67+
def z_order_optimize(
68+
self,
69+
z_order_columns: List[str],
70+
partition_filters: Optional[FilterType],
71+
target_size: Optional[int],
72+
max_concurrent_tasks: Optional[int],
73+
max_spill_size: Optional[int],
74+
min_commit_interval: Optional[int],
75+
) -> str: ...
76+
def restore(
77+
self,
78+
target: Optional[Any],
79+
ignore_missing_files: bool,
80+
protocol_downgrade_allowed: bool,
81+
) -> str: ...
82+
def history(self, limit: Optional[int]) -> List[str]: ...
83+
def update_incremental(self) -> None: ...
84+
def dataset_partitions(
85+
self, schema: pa.Schema, partition_filters: Optional[FilterType]
86+
) -> List[Any]: ...
87+
def create_checkpoint(self) -> None: ...
88+
def get_add_actions(self, flatten: bool) -> pa.RecordBatch: ...
89+
def delete(self, predicate: Optional[str]) -> str: ...
90+
def get_active_partitions(
91+
self, partitions_filters: Optional[FilterType] = None
92+
) -> Any: ...
93+
def create_write_transaction(
94+
self,
95+
add_actions: List[AddAction],
96+
mode: str,
97+
partition_by: List[str],
98+
schema: pa.Schema,
99+
partitions_filters: Optional[FilterType],
100+
) -> None: ...
33101

102+
def rust_core_version() -> str: ...
103+
def write_new_deltalake(
104+
table_uri: str,
105+
schema: pa.Schema,
106+
add_actions: List[AddAction],
107+
_mode: str,
108+
partition_by: List[str],
109+
name: Optional[str],
110+
description: Optional[str],
111+
configuration: Optional[Mapping[str, Optional[str]]],
112+
storage_options: Optional[Dict[str, str]],
113+
) -> None: ...
34114
def batch_distinct(batch: pa.RecordBatch) -> pa.RecordBatch: ...
35115

36116
# Can't implement inheritance (see note in src/schema.rs), so this is next
@@ -93,34 +173,18 @@ class Field:
93173
*,
94174
nullable: bool = True,
95175
metadata: Optional[Dict[str, Any]] = None,
96-
) -> None:
97-
"""A named field, with a data type, nullability, and optional metadata."""
176+
) -> None: ...
98177
name: str
99-
"""The field name."""
100178
type: DataType
101-
"""The field data type."""
102179
nullable: bool
103-
"""The field nullability."""
104180
metadata: Dict[str, Any]
105-
"""The field metadata."""
106-
107-
def to_json(self) -> str:
108-
"""Get the JSON representation of the Field.
109181

110-
:rtype: str
111-
"""
182+
def to_json(self) -> str: ...
112183
@staticmethod
113-
def from_json(json: str) -> "Field":
114-
"""Create a new Field from a JSON string.
115-
116-
:param json: A json string representing the Field.
117-
:rtype: Field
118-
"""
119-
def to_pyarrow(self) -> pa.Field:
120-
"""Convert field to a pyarrow.Field."""
184+
def from_json(json: str) -> "Field": ...
185+
def to_pyarrow(self) -> pa.Field: ...
121186
@staticmethod
122-
def from_pyarrow(type: pa.Field) -> "Field":
123-
"""Create a new field from pyarrow.Field."""
187+
def from_pyarrow(type: pa.Field) -> "Field": ...
124188

125189
class StructType:
126190
def __init__(self, fields: List[Field]) -> None: ...
@@ -138,41 +202,13 @@ class Schema:
138202
def __init__(self, fields: List[Field]) -> None: ...
139203
fields: List[Field]
140204
invariants: List[Tuple[str, str]]
141-
"""The list of invariants defined on the table.
142-
143-
The first string in each tuple is the field path, the second is the SQL of the invariant.
144-
"""
145205

146-
def to_json(self) -> str:
147-
"""Get the JSON representation of the schema.
148-
149-
:rtype: str
150-
"""
206+
def to_json(self) -> str: ...
151207
@staticmethod
152-
def from_json(json: str) -> "Schema":
153-
"""Create a new Schema from a JSON string.
154-
155-
:param schema_json: a JSON string
156-
:rtype: Schema
157-
"""
158-
def to_pyarrow(self, as_large_types: bool = False) -> pa.Schema:
159-
"""Return equivalent PyArrow schema.
160-
161-
Note: this conversion is lossy as the Invariants are not stored in pyarrow.Schema.
162-
163-
:param as_large_types: get schema with all variable size types (list,
164-
binary, string) as large variants (with int64 indices). This is for
165-
compatibility with systems like Polars that only support the large
166-
versions of Arrow types.
167-
:rtype: pyarrow.Schema
168-
"""
208+
def from_json(json: str) -> "Schema": ...
209+
def to_pyarrow(self, as_large_types: bool = False) -> pa.Schema: ...
169210
@staticmethod
170-
def from_pyarrow(type: pa.Schema) -> "Schema":
171-
"""Create a new Schema from a pyarrow.Schema.
172-
173-
:param data_type: a PyArrow schema
174-
:rtype: Schema
175-
"""
211+
def from_pyarrow(type: pa.Schema) -> "Schema": ...
176212

177213
class ObjectInputFile:
178214
@property
@@ -289,3 +325,8 @@ class DeltaProtocolError(DeltaError):
289325
"""Raised when a violation with the Delta protocol specs ocurred."""
290326

291327
pass
328+
329+
FilterLiteralType = Tuple[str, str, Any]
330+
FilterConjunctionType = List[FilterLiteralType]
331+
FilterDNFType = List[FilterConjunctionType]
332+
FilterType = Union[FilterConjunctionType, FilterDNFType]

python/docs/source/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def get_release_version() -> str:
6161
("py:class", "pyarrow._fs.FileInfo"),
6262
("py:class", "pyarrow._fs.FileSelector"),
6363
("py:class", "pyarrow._fs.FileSystemHandler"),
64-
("py:class", "RawDeltaTable"),
64+
("py:class", "deltalake._internal.RawDeltaTable"),
6565
("py:class", "pandas.DataFrame"),
6666
("py:class", "pyarrow._dataset_parquet.ParquetFileWriteOptions"),
6767
("py:class", "pathlib.Path"),

0 commit comments

Comments
 (0)