Skip to content

Commit 23eecb7

Browse files
committed
Restructure source code with separation of concerns
Reorganize the entire codebase following modern C++/Python project best practices with clear separation of concerns. ## Major Structural Changes ### C++ Code Organization - Create `include/prtree/` for public C++ headers - `core/` - Core algorithm (prtree.h) - `utils/` - Utilities (parallel.h, small_vector.h) - `core/detail/` - Reserved for future modularization - Move Python bindings to `src/cpp/bindings/python_bindings.cc` - Add documentation for future modularization (prtree.h is 1617 lines) ### Python Package Structure - Split monolithic `__init__.py` into modular components: - `__init__.py` - Package entry point with version - `core.py` - PRTree2D/3D/4D classes with full documentation - `py.typed` - PEP 561 type hints marker - Better separation: Python API vs C++ bindings - Improved docstrings and type hints ### Benchmarks Organization - Separate C++ and Python benchmarks: - `benchmarks/cpp/` - All C++ benchmark files - `benchmarks/python/` - Reserved for future Python benchmarks - Update CMakeLists.txt to use new paths ### Documentation Organization - Create structured docs directory: - `docs/examples/` - Example notebooks - `docs/images/` - Documentation images - `docs/baseline/` - Benchmark baseline data ### Build System Updates - Update CMakeLists.txt: - Use explicit source file lists (PRTREE_SOURCES) - Add include directory configuration (PRTREE_INCLUDE_DIRS) - Update all benchmark paths - Support both new and legacy paths during migration - Update MANIFEST.in for new structure ### Comprehensive Documentation - Add ARCHITECTURE.md: - Detailed explanation of project structure - Architectural layers and data flow - Separation of concerns by functionality - Build system documentation - Future improvement plans - Update DEVELOPMENT.md with new structure ## Benefits ### For Contributors - Clear separation makes it obvious where code belongs - Easier to find and modify specific functionality - Better understanding of component relationships - Documented modularization path for large files ### For Maintainers - Modular structure supports independent component changes - Clearer dependencies between components - Foundation for future optimizations (compilation, testing) - Better code organization reduces technical debt ### For Users - No API changes - fully backwards compatible - Better type hints and documentation - Improved reliability through better organization ## Backward Compatibility - All existing imports continue to work - Python API unchanged - Legacy `cpp/` directory retained temporarily - Build system supports both old and new paths ## Future Work - Modularize prtree.h (1617 lines → multiple focused files) - Add C++ unit tests for isolated components - Add Python-level benchmarks - Generate API documentation with Sphinx See ARCHITECTURE.md for detailed structure documentation.
1 parent 0712830 commit 23eecb7

File tree

20 files changed

+3639
-149
lines changed

20 files changed

+3639
-149
lines changed

ARCHITECTURE.md

Lines changed: 335 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,335 @@
1+
# Project Architecture
2+
3+
This document describes the architecture and directory structure of python_prtree.
4+
5+
## Overview
6+
7+
python_prtree is a Python package that provides fast spatial indexing using the Priority R-Tree data structure. It consists of:
8+
9+
1. **C++ Core**: High-performance implementation of the Priority R-Tree algorithm
10+
2. **Python Bindings**: pybind11-based bindings exposing C++ functionality to Python
11+
3. **Python Wrapper**: User-friendly Python API with additional features
12+
13+
## Directory Structure
14+
15+
```
16+
python_prtree/
17+
├── include/ # C++ Public Headers (API)
18+
│ └── prtree/
19+
│ ├── core/ # Core algorithm headers
20+
│ │ └── prtree.h # Main PRTree class template
21+
│ └── utils/ # Utility headers
22+
│ ├── parallel.h # Parallel processing utilities
23+
│ └── small_vector.h # Optimized vector implementation
24+
25+
├── src/ # Source Code
26+
│ ├── cpp/ # C++ Implementation
27+
│ │ ├── core/ # Core implementation (future)
28+
│ │ └── bindings/ # Python bindings
29+
│ │ └── python_bindings.cc # pybind11 bindings
30+
│ │
31+
│ └── python_prtree/ # Python Package
32+
│ ├── __init__.py # Package entry point
33+
│ ├── core.py # PRTree2D/3D/4D classes
34+
│ └── py.typed # Type hints marker (PEP 561)
35+
36+
├── tests/ # Test Suite
37+
│ ├── unit/ # Unit tests (individual features)
38+
│ │ ├── test_construction.py
39+
│ │ ├── test_query.py
40+
│ │ ├── test_insert.py
41+
│ │ ├── test_erase.py
42+
│ │ └── ...
43+
│ ├── integration/ # Integration tests (workflows)
44+
│ │ ├── test_insert_query_workflow.py
45+
│ │ ├── test_persistence_query_workflow.py
46+
│ │ └── ...
47+
│ ├── e2e/ # End-to-end tests
48+
│ │ ├── test_readme_examples.py
49+
│ │ └── test_user_workflows.py
50+
│ └── conftest.py # Shared test fixtures
51+
52+
├── benchmarks/ # Performance Benchmarks
53+
│ ├── cpp/ # C++ benchmarks
54+
│ │ ├── benchmark_construction.cpp
55+
│ │ ├── benchmark_query.cpp
56+
│ │ ├── benchmark_parallel.cpp
57+
│ │ └── stress_test_concurrent.cpp
58+
│ └── python/ # Python benchmarks (future)
59+
│ └── README.md
60+
61+
├── docs/ # Documentation
62+
│ ├── examples/ # Example notebooks and scripts
63+
│ │ └── experiment.ipynb
64+
│ ├── images/ # Documentation images
65+
│ └── baseline/ # Benchmark baseline data
66+
67+
├── tools/ # Development Tools
68+
│ ├── analyze_baseline.py # Benchmark analysis
69+
│ ├── profile.py # Profiling script
70+
│ ├── profile.sh # Profiling shell script
71+
│ └── profile_all_workloads.sh
72+
73+
└── third/ # Third-party Dependencies (git submodules)
74+
├── pybind11/ # Python bindings framework
75+
├── cereal/ # Serialization library
76+
└── snappy/ # Compression library
77+
```
78+
79+
## Architectural Layers
80+
81+
### 1. Core C++ Layer (`include/prtree/core/`)
82+
83+
**Purpose**: Implements the Priority R-Tree algorithm
84+
85+
**Key Components**:
86+
- `prtree.h`: Main template class `PRTree<T, B, D>`
87+
- `T`: Index type (typically `int64_t`)
88+
- `B`: Branching factor (default: 8)
89+
- `D`: Dimensions (2, 3, or 4)
90+
91+
**Design Principles**:
92+
- Header-only template library for performance
93+
- No Python dependencies at this layer
94+
- Pure C++ with C++20 features
95+
96+
### 2. Utilities Layer (`include/prtree/utils/`)
97+
98+
**Purpose**: Supporting data structures and algorithms
99+
100+
**Components**:
101+
- `parallel.h`: Thread-safe parallel processing utilities
102+
- `small_vector.h`: Cache-friendly vector with small size optimization
103+
104+
**Design Principles**:
105+
- Reusable utilities independent of PRTree
106+
- Optimized for performance (SSE, cache-locality)
107+
108+
### 3. Python Bindings Layer (`src/cpp/bindings/`)
109+
110+
**Purpose**: Expose C++ functionality to Python using pybind11
111+
112+
**Key File**: `python_bindings.cc`
113+
114+
**Responsibilities**:
115+
- Create Python classes from C++ templates
116+
- Handle numpy array conversions
117+
- Expose methods with Python-friendly signatures
118+
- Provide module-level documentation
119+
120+
**Design Principles**:
121+
- Thin binding layer (minimal logic)
122+
- Direct mapping to C++ API
123+
- Efficient numpy integration
124+
125+
### 4. Python Wrapper Layer (`src/python_prtree/`)
126+
127+
**Purpose**: User-friendly Python API with safety features
128+
129+
**Key Files**:
130+
- `__init__.py`: Package entry point and version info
131+
- `core.py`: Main user-facing classes (`PRTree2D`, `PRTree3D`, `PRTree4D`)
132+
133+
**Added Features**:
134+
- Empty tree safety (prevent segfaults)
135+
- Python object storage (pickle serialization)
136+
- Convenient APIs (auto-indexing, return_obj parameter)
137+
- Type hints and documentation
138+
139+
**Design Principles**:
140+
- Safety over raw performance
141+
- Pythonic API design
142+
- Backwards compatibility considerations
143+
144+
## Data Flow
145+
146+
### Construction
147+
```
148+
User Code
149+
↓ (numpy arrays)
150+
PRTree2D/3D/4D (Python)
151+
↓ (arrays + validation)
152+
_PRTree2D/3D/4D (pybind11)
153+
↓ (type conversion)
154+
PRTree<int64_t, 8, D> (C++)
155+
↓ (algorithm)
156+
Optimized R-Tree Structure
157+
```
158+
159+
### Query
160+
```
161+
User Code
162+
↓ (query box)
163+
PRTree2D.query() (Python)
164+
↓ (empty tree check)
165+
_PRTree2D.query() (pybind11)
166+
↓ (type conversion)
167+
PRTree::find_one() (C++)
168+
↓ (tree traversal)
169+
Result Indices
170+
↓ (optional: object retrieval)
171+
User Code
172+
```
173+
174+
## Separation of Concerns
175+
176+
### By Functionality
177+
178+
1. **Core Algorithm** (`include/prtree/core/`)
179+
- Spatial indexing logic
180+
- Tree construction and traversal
181+
- No I/O, no Python
182+
183+
2. **Utilities** (`include/prtree/utils/`)
184+
- Generic helpers
185+
- Reusable across projects
186+
187+
3. **Bindings** (`src/cpp/bindings/`)
188+
- Python/C++ bridge
189+
- Type conversions only
190+
191+
4. **Python API** (`src/python_prtree/`)
192+
- User interface
193+
- Safety and convenience
194+
195+
### By Testing
196+
197+
1. **Unit Tests** (`tests/unit/`)
198+
- Test individual features in isolation
199+
- Fast, focused tests
200+
- Examples: `test_insert.py`, `test_query.py`
201+
202+
2. **Integration Tests** (`tests/integration/`)
203+
- Test feature interactions
204+
- Workflow-based tests
205+
- Examples: `test_insert_query_workflow.py`
206+
207+
3. **E2E Tests** (`tests/e2e/`)
208+
- Test complete user scenarios
209+
- Documentation examples
210+
- Examples: `test_readme_examples.py`
211+
212+
## Build System
213+
214+
### CMake Configuration
215+
216+
**Key Variables**:
217+
- `PRTREE_SOURCES`: Source files to compile
218+
- `PRTREE_INCLUDE_DIRS`: Header search paths
219+
220+
**Targets**:
221+
- `PRTree`: Main Python extension module
222+
- `benchmark_*`: C++ benchmark executables (optional)
223+
224+
**Options**:
225+
- `BUILD_BENCHMARKS`: Enable benchmark compilation
226+
- `ENABLE_PROFILING`: Build with profiling symbols
227+
- `ENABLE_ASAN/TSAN/UBSAN`: Enable sanitizers
228+
229+
### Build Process
230+
231+
```
232+
User runs: pip install -e .
233+
234+
setup.py invoked
235+
236+
CMakeBuild.build_extension()
237+
238+
CMake configuration
239+
- Find dependencies (pybind11, cereal, snappy)
240+
- Set compiler flags
241+
- Configure include paths
242+
243+
CMake build
244+
- Compile C++ to shared library (.so/.pyd)
245+
- Link dependencies
246+
247+
Extension installed in src/python_prtree/
248+
```
249+
250+
## Design Decisions
251+
252+
### Header-Only Core
253+
254+
**Decision**: Keep core PRTree as header-only template library
255+
256+
**Rationale**:
257+
- Enables full compiler optimization
258+
- Simplifies distribution
259+
- No need for .cc files at core layer
260+
261+
**Trade-offs**:
262+
- Longer compilation times
263+
- Larger binary size
264+
265+
### Separate Bindings File
266+
267+
**Decision**: Single `python_bindings.cc` file separate from core
268+
269+
**Rationale**:
270+
- Clear separation: core C++ vs. Python interface
271+
- Core can be reused in C++-only projects
272+
- Easier to maintain Python API changes
273+
274+
### Python Wrapper Layer
275+
276+
**Decision**: Add Python wrapper on top of pybind11 bindings
277+
278+
**Rationale**:
279+
- Safety: prevent segfaults on empty trees
280+
- Convenience: Pythonic APIs, object storage
281+
- Evolution: can change API without C++ recompilation
282+
283+
**Trade-offs**:
284+
- Extra layer adds slight overhead
285+
- More code to maintain
286+
287+
### Test Organization
288+
289+
**Decision**: Three-tier test structure (unit/integration/e2e)
290+
291+
**Rationale**:
292+
- Fast feedback loop with unit tests
293+
- Comprehensive coverage with integration tests
294+
- Real-world validation with e2e tests
295+
- Easy to run subsets: `pytest tests/unit -v`
296+
297+
## Future Improvements
298+
299+
1. **Split prtree.h**: Large monolithic header could be split into:
300+
- `prtree_fwd.h`: Forward declarations
301+
- `prtree_node.h`: Node implementation
302+
- `prtree_query.h`: Query algorithms
303+
- `prtree_insert.h`: Insert/erase logic
304+
305+
2. **C++ Core Library**: Extract core into `src/cpp/core/` for:
306+
- Faster compilation
307+
- Better code organization
308+
- Easier testing of C++ layer independently
309+
310+
3. **Python Benchmarks**: Add `benchmarks/python/` for:
311+
- Performance regression testing
312+
- Comparison with other Python libraries
313+
- Memory profiling
314+
315+
4. **Documentation**: Add `docs/api/` with:
316+
- Sphinx-generated API docs
317+
- Architecture diagrams
318+
- Performance tuning guide
319+
320+
## Contributing
321+
322+
When adding new features, follow the separation of concerns:
323+
324+
1. **Core algorithm changes**: Modify `include/prtree/core/prtree.h`
325+
2. **Expose to Python**: Update `src/cpp/bindings/python_bindings.cc`
326+
3. **Python API enhancements**: Update `src/python_prtree/core.py`
327+
4. **Add tests**: Unit tests for features, integration tests for workflows
328+
329+
See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed contribution guidelines.
330+
331+
## References
332+
333+
- **Priority R-Tree Paper**: Arge et al., SIGMOD 2004
334+
- **pybind11**: https://pybind11.readthedocs.io/
335+
- **Python Packaging**: PEP 517, PEP 518, PEP 621

0 commit comments

Comments
 (0)