Skip to content

Commit 66b9141

Browse files
authored
docs: Python development docs (#35)
1 parent d29cf97 commit 66b9141

File tree

1 file changed

+149
-0
lines changed

1 file changed

+149
-0
lines changed

python/DEVELOP.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Python Development Guide
2+
3+
This guide covers Python-specific development for the `geodatafusion` Python package, which provides Python bindings for the Rust `geodatafusion` library using PyO3.
4+
5+
This guide is **in addition** to the top-level [DEVELOP.md](../DEVELOP.md) that covers Rust development. You should read that first and all Rust-related instructions also apply here.
6+
7+
## Overview
8+
9+
The Python package is a separate workspace that wraps the Rust library using:
10+
11+
- **PyO3** - Rust bindings for Python (automatically installed via Cargo)
12+
- **Maturin** - Build system for PyO3 packages (automatically installed in the dev environment by uv)
13+
- **uv** - Python package and virtual environment manager
14+
15+
## Prerequisites
16+
17+
**uv** is the recommended package manager ([install instructions](https://docs.astral.sh/uv/)).
18+
19+
## Project Structure
20+
21+
```
22+
python/
23+
├── Cargo.toml # Rust package configuration
24+
├── pyproject.toml # Python package configuration
25+
├── src/ # Rust source (PyO3 bindings)
26+
│ ├── lib.rs # Main module entry point
27+
│ ├── udf/ # UDF registration modules
28+
│ └── utils.rs # Helper utilities
29+
├── python/ # Pure Python source
30+
│ └── geodatafusion/
31+
│ └── __init__.py # Python API
32+
├── tests/ # Python tests
33+
│ └── udf/ # UDF tests
34+
└── examples/ # Example scripts
35+
```
36+
37+
## Getting Started
38+
39+
### Clone and Setup
40+
41+
```bash
42+
# From the repository root
43+
cd python
44+
45+
# Create virtual environment and install dependencies
46+
uv sync --no-install-package geodatafusion
47+
```
48+
49+
The `--no-install-package geodatafusion` avoids building `geodatafusion` itself (in release mode) during setup. **Maturin** is automatically installed in the dev environment by uv.
50+
51+
### Build the Package
52+
53+
There are two ways to build the package:
54+
55+
#### Development Build (Fast, Debug Mode)
56+
57+
```bash
58+
# Build and install in development mode
59+
uv run --no-project maturin develop --uv
60+
```
61+
62+
**Note**: Debug builds will show a performance warning at runtime. This is expected during development.
63+
64+
#### Release Build (Optimized)
65+
66+
```bash
67+
# Build optimized release version
68+
uv run --no-project maturin develop --uv --release
69+
70+
# Or build wheel for distribution
71+
uv run --no-project maturin build --uv --release
72+
```
73+
74+
## Development Workflow
75+
76+
### Running Tests
77+
78+
```bash
79+
# Run all tests
80+
uv run --no-project pytest
81+
82+
# Run specific test file
83+
uv run --no-project pytest tests/test_register.py
84+
85+
# Run with verbose output
86+
uv run --no-project pytest -v
87+
88+
# Run with output capture disabled (see print statements)
89+
uv run --no-project pytest -s
90+
```
91+
92+
## Adding a new UDF
93+
94+
When a new UDF is added to the Rust library, you need to expose it in Python:
95+
96+
### 0. Implement the UDF in Rust
97+
98+
Follow the instructions in the top-level [DEVELOP.md](../DEVELOP.md) to implement the UDF in Rust first.
99+
100+
### 1. Update Rust Bindings
101+
102+
The UDF modules are in `src/udf/`. Each module corresponds to a category:
103+
104+
- `src/udf/native/` - Native implementations
105+
- `src/udf/geo/` - Geo trait implementations
106+
- `src/udf/geohash/` - GeoHash functions
107+
108+
Wrap the UDF, using one of our existing macros, if possible.
109+
110+
- `impl_udf!`: for UDFs without any instantiation arguments
111+
- `impl_udf_coord_type_arg!`: for UDFs taking a `CoordType` argument upon instantiation
112+
113+
### 2. Add to Python Module
114+
115+
For example, in `src/udf/geo/mod.rs`, register the new function:
116+
117+
```rs
118+
#[pymodule]
119+
pub(crate) fn geo(m: &Bound<PyModule>) -> PyResult<()> {
120+
m.add_class::<NewUdf>()?;
121+
```
122+
123+
### 3. Update Python API
124+
125+
The Python API is in `python/geodatafusion/__init__.py`. Update `register_all()` or add specific registration functions if needed.
126+
127+
This ensures that the UDF is easily injected onto a DataFusion `SessionContext`.
128+
129+
### 4. Add Tests
130+
131+
Create tests in `tests/udf/` following the existing structure:
132+
133+
```python
134+
from datafusion import SessionContext
135+
from geodatafusion import register_all
136+
137+
138+
def test_my_new_function():
139+
ctx = SessionContext()
140+
register_all(ctx)
141+
142+
sql = "SELECT my_new_function(ST_GeomFromText('POINT(1 2)'));"
143+
result = ctx.sql(sql)
144+
assert result.to_arrow_table().columns[0][0].as_py() == expected_value
145+
```
146+
147+
## Package Building
148+
149+
Push a new git tag to GitHub, and the CI workflow will build and publish wheels automatically.

0 commit comments

Comments
 (0)