|
| 1 | +# Project Architecture |
| 2 | + |
| 3 | +This document describes the architecture and directory structure of python_prtree. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +python_prtree is a Python package that provides fast spatial indexing using the Priority R-Tree data structure. It consists of: |
| 8 | + |
| 9 | +1. **C++ Core**: High-performance implementation of the Priority R-Tree algorithm |
| 10 | +2. **Python Bindings**: pybind11-based bindings exposing C++ functionality to Python |
| 11 | +3. **Python Wrapper**: User-friendly Python API with additional features |
| 12 | + |
| 13 | +## Directory Structure |
| 14 | + |
| 15 | +``` |
| 16 | +python_prtree/ |
| 17 | +├── include/ # C++ Public Headers (API) |
| 18 | +│ └── prtree/ |
| 19 | +│ ├── core/ # Core algorithm headers |
| 20 | +│ │ └── prtree.h # Main PRTree class template |
| 21 | +│ └── utils/ # Utility headers |
| 22 | +│ ├── parallel.h # Parallel processing utilities |
| 23 | +│ └── small_vector.h # Optimized vector implementation |
| 24 | +│ |
| 25 | +├── src/ # Source Code |
| 26 | +│ ├── cpp/ # C++ Implementation |
| 27 | +│ │ ├── core/ # Core implementation (future) |
| 28 | +│ │ └── bindings/ # Python bindings |
| 29 | +│ │ └── python_bindings.cc # pybind11 bindings |
| 30 | +│ │ |
| 31 | +│ └── python_prtree/ # Python Package |
| 32 | +│ ├── __init__.py # Package entry point |
| 33 | +│ ├── core.py # PRTree2D/3D/4D classes |
| 34 | +│ └── py.typed # Type hints marker (PEP 561) |
| 35 | +│ |
| 36 | +├── tests/ # Test Suite |
| 37 | +│ ├── unit/ # Unit tests (individual features) |
| 38 | +│ │ ├── test_construction.py |
| 39 | +│ │ ├── test_query.py |
| 40 | +│ │ ├── test_insert.py |
| 41 | +│ │ ├── test_erase.py |
| 42 | +│ │ └── ... |
| 43 | +│ ├── integration/ # Integration tests (workflows) |
| 44 | +│ │ ├── test_insert_query_workflow.py |
| 45 | +│ │ ├── test_persistence_query_workflow.py |
| 46 | +│ │ └── ... |
| 47 | +│ ├── e2e/ # End-to-end tests |
| 48 | +│ │ ├── test_readme_examples.py |
| 49 | +│ │ └── test_user_workflows.py |
| 50 | +│ └── conftest.py # Shared test fixtures |
| 51 | +│ |
| 52 | +├── benchmarks/ # Performance Benchmarks |
| 53 | +│ ├── cpp/ # C++ benchmarks |
| 54 | +│ │ ├── benchmark_construction.cpp |
| 55 | +│ │ ├── benchmark_query.cpp |
| 56 | +│ │ ├── benchmark_parallel.cpp |
| 57 | +│ │ └── stress_test_concurrent.cpp |
| 58 | +│ └── python/ # Python benchmarks (future) |
| 59 | +│ └── README.md |
| 60 | +│ |
| 61 | +├── docs/ # Documentation |
| 62 | +│ ├── examples/ # Example notebooks and scripts |
| 63 | +│ │ └── experiment.ipynb |
| 64 | +│ ├── images/ # Documentation images |
| 65 | +│ └── baseline/ # Benchmark baseline data |
| 66 | +│ |
| 67 | +├── tools/ # Development Tools |
| 68 | +│ ├── analyze_baseline.py # Benchmark analysis |
| 69 | +│ ├── profile.py # Profiling script |
| 70 | +│ ├── profile.sh # Profiling shell script |
| 71 | +│ └── profile_all_workloads.sh |
| 72 | +│ |
| 73 | +└── third/ # Third-party Dependencies (git submodules) |
| 74 | + ├── pybind11/ # Python bindings framework |
| 75 | + ├── cereal/ # Serialization library |
| 76 | + └── snappy/ # Compression library |
| 77 | +``` |
| 78 | + |
| 79 | +## Architectural Layers |
| 80 | + |
| 81 | +### 1. Core C++ Layer (`include/prtree/core/`) |
| 82 | + |
| 83 | +**Purpose**: Implements the Priority R-Tree algorithm |
| 84 | + |
| 85 | +**Key Components**: |
| 86 | +- `prtree.h`: Main template class `PRTree<T, B, D>` |
| 87 | + - `T`: Index type (typically `int64_t`) |
| 88 | + - `B`: Branching factor (default: 8) |
| 89 | + - `D`: Dimensions (2, 3, or 4) |
| 90 | + |
| 91 | +**Design Principles**: |
| 92 | +- Header-only template library for performance |
| 93 | +- No Python dependencies at this layer |
| 94 | +- Pure C++ with C++20 features |
| 95 | + |
| 96 | +### 2. Utilities Layer (`include/prtree/utils/`) |
| 97 | + |
| 98 | +**Purpose**: Supporting data structures and algorithms |
| 99 | + |
| 100 | +**Components**: |
| 101 | +- `parallel.h`: Thread-safe parallel processing utilities |
| 102 | +- `small_vector.h`: Cache-friendly vector with small size optimization |
| 103 | + |
| 104 | +**Design Principles**: |
| 105 | +- Reusable utilities independent of PRTree |
| 106 | +- Optimized for performance (SSE, cache-locality) |
| 107 | + |
| 108 | +### 3. Python Bindings Layer (`src/cpp/bindings/`) |
| 109 | + |
| 110 | +**Purpose**: Expose C++ functionality to Python using pybind11 |
| 111 | + |
| 112 | +**Key File**: `python_bindings.cc` |
| 113 | + |
| 114 | +**Responsibilities**: |
| 115 | +- Create Python classes from C++ templates |
| 116 | +- Handle numpy array conversions |
| 117 | +- Expose methods with Python-friendly signatures |
| 118 | +- Provide module-level documentation |
| 119 | + |
| 120 | +**Design Principles**: |
| 121 | +- Thin binding layer (minimal logic) |
| 122 | +- Direct mapping to C++ API |
| 123 | +- Efficient numpy integration |
| 124 | + |
| 125 | +### 4. Python Wrapper Layer (`src/python_prtree/`) |
| 126 | + |
| 127 | +**Purpose**: User-friendly Python API with safety features |
| 128 | + |
| 129 | +**Key Files**: |
| 130 | +- `__init__.py`: Package entry point and version info |
| 131 | +- `core.py`: Main user-facing classes (`PRTree2D`, `PRTree3D`, `PRTree4D`) |
| 132 | + |
| 133 | +**Added Features**: |
| 134 | +- Empty tree safety (prevent segfaults) |
| 135 | +- Python object storage (pickle serialization) |
| 136 | +- Convenient APIs (auto-indexing, return_obj parameter) |
| 137 | +- Type hints and documentation |
| 138 | + |
| 139 | +**Design Principles**: |
| 140 | +- Safety over raw performance |
| 141 | +- Pythonic API design |
| 142 | +- Backwards compatibility considerations |
| 143 | + |
| 144 | +## Data Flow |
| 145 | + |
| 146 | +### Construction |
| 147 | +``` |
| 148 | +User Code |
| 149 | + ↓ (numpy arrays) |
| 150 | +PRTree2D/3D/4D (Python) |
| 151 | + ↓ (arrays + validation) |
| 152 | +_PRTree2D/3D/4D (pybind11) |
| 153 | + ↓ (type conversion) |
| 154 | +PRTree<int64_t, 8, D> (C++) |
| 155 | + ↓ (algorithm) |
| 156 | +Optimized R-Tree Structure |
| 157 | +``` |
| 158 | + |
| 159 | +### Query |
| 160 | +``` |
| 161 | +User Code |
| 162 | + ↓ (query box) |
| 163 | +PRTree2D.query() (Python) |
| 164 | + ↓ (empty tree check) |
| 165 | +_PRTree2D.query() (pybind11) |
| 166 | + ↓ (type conversion) |
| 167 | +PRTree::find_one() (C++) |
| 168 | + ↓ (tree traversal) |
| 169 | +Result Indices |
| 170 | + ↓ (optional: object retrieval) |
| 171 | +User Code |
| 172 | +``` |
| 173 | + |
| 174 | +## Separation of Concerns |
| 175 | + |
| 176 | +### By Functionality |
| 177 | + |
| 178 | +1. **Core Algorithm** (`include/prtree/core/`) |
| 179 | + - Spatial indexing logic |
| 180 | + - Tree construction and traversal |
| 181 | + - No I/O, no Python |
| 182 | + |
| 183 | +2. **Utilities** (`include/prtree/utils/`) |
| 184 | + - Generic helpers |
| 185 | + - Reusable across projects |
| 186 | + |
| 187 | +3. **Bindings** (`src/cpp/bindings/`) |
| 188 | + - Python/C++ bridge |
| 189 | + - Type conversions only |
| 190 | + |
| 191 | +4. **Python API** (`src/python_prtree/`) |
| 192 | + - User interface |
| 193 | + - Safety and convenience |
| 194 | + |
| 195 | +### By Testing |
| 196 | + |
| 197 | +1. **Unit Tests** (`tests/unit/`) |
| 198 | + - Test individual features in isolation |
| 199 | + - Fast, focused tests |
| 200 | + - Examples: `test_insert.py`, `test_query.py` |
| 201 | + |
| 202 | +2. **Integration Tests** (`tests/integration/`) |
| 203 | + - Test feature interactions |
| 204 | + - Workflow-based tests |
| 205 | + - Examples: `test_insert_query_workflow.py` |
| 206 | + |
| 207 | +3. **E2E Tests** (`tests/e2e/`) |
| 208 | + - Test complete user scenarios |
| 209 | + - Documentation examples |
| 210 | + - Examples: `test_readme_examples.py` |
| 211 | + |
| 212 | +## Build System |
| 213 | + |
| 214 | +### CMake Configuration |
| 215 | + |
| 216 | +**Key Variables**: |
| 217 | +- `PRTREE_SOURCES`: Source files to compile |
| 218 | +- `PRTREE_INCLUDE_DIRS`: Header search paths |
| 219 | + |
| 220 | +**Targets**: |
| 221 | +- `PRTree`: Main Python extension module |
| 222 | +- `benchmark_*`: C++ benchmark executables (optional) |
| 223 | + |
| 224 | +**Options**: |
| 225 | +- `BUILD_BENCHMARKS`: Enable benchmark compilation |
| 226 | +- `ENABLE_PROFILING`: Build with profiling symbols |
| 227 | +- `ENABLE_ASAN/TSAN/UBSAN`: Enable sanitizers |
| 228 | + |
| 229 | +### Build Process |
| 230 | + |
| 231 | +``` |
| 232 | +User runs: pip install -e . |
| 233 | + ↓ |
| 234 | +setup.py invoked |
| 235 | + ↓ |
| 236 | +CMakeBuild.build_extension() |
| 237 | + ↓ |
| 238 | +CMake configuration |
| 239 | + - Find dependencies (pybind11, cereal, snappy) |
| 240 | + - Set compiler flags |
| 241 | + - Configure include paths |
| 242 | + ↓ |
| 243 | +CMake build |
| 244 | + - Compile C++ to shared library (.so/.pyd) |
| 245 | + - Link dependencies |
| 246 | + ↓ |
| 247 | +Extension installed in src/python_prtree/ |
| 248 | +``` |
| 249 | + |
| 250 | +## Design Decisions |
| 251 | + |
| 252 | +### Header-Only Core |
| 253 | + |
| 254 | +**Decision**: Keep core PRTree as header-only template library |
| 255 | + |
| 256 | +**Rationale**: |
| 257 | +- Enables full compiler optimization |
| 258 | +- Simplifies distribution |
| 259 | +- No need for .cc files at core layer |
| 260 | + |
| 261 | +**Trade-offs**: |
| 262 | +- Longer compilation times |
| 263 | +- Larger binary size |
| 264 | + |
| 265 | +### Separate Bindings File |
| 266 | + |
| 267 | +**Decision**: Single `python_bindings.cc` file separate from core |
| 268 | + |
| 269 | +**Rationale**: |
| 270 | +- Clear separation: core C++ vs. Python interface |
| 271 | +- Core can be reused in C++-only projects |
| 272 | +- Easier to maintain Python API changes |
| 273 | + |
| 274 | +### Python Wrapper Layer |
| 275 | + |
| 276 | +**Decision**: Add Python wrapper on top of pybind11 bindings |
| 277 | + |
| 278 | +**Rationale**: |
| 279 | +- Safety: prevent segfaults on empty trees |
| 280 | +- Convenience: Pythonic APIs, object storage |
| 281 | +- Evolution: can change API without C++ recompilation |
| 282 | + |
| 283 | +**Trade-offs**: |
| 284 | +- Extra layer adds slight overhead |
| 285 | +- More code to maintain |
| 286 | + |
| 287 | +### Test Organization |
| 288 | + |
| 289 | +**Decision**: Three-tier test structure (unit/integration/e2e) |
| 290 | + |
| 291 | +**Rationale**: |
| 292 | +- Fast feedback loop with unit tests |
| 293 | +- Comprehensive coverage with integration tests |
| 294 | +- Real-world validation with e2e tests |
| 295 | +- Easy to run subsets: `pytest tests/unit -v` |
| 296 | + |
| 297 | +## Future Improvements |
| 298 | + |
| 299 | +1. **Split prtree.h**: Large monolithic header could be split into: |
| 300 | + - `prtree_fwd.h`: Forward declarations |
| 301 | + - `prtree_node.h`: Node implementation |
| 302 | + - `prtree_query.h`: Query algorithms |
| 303 | + - `prtree_insert.h`: Insert/erase logic |
| 304 | + |
| 305 | +2. **C++ Core Library**: Extract core into `src/cpp/core/` for: |
| 306 | + - Faster compilation |
| 307 | + - Better code organization |
| 308 | + - Easier testing of C++ layer independently |
| 309 | + |
| 310 | +3. **Python Benchmarks**: Add `benchmarks/python/` for: |
| 311 | + - Performance regression testing |
| 312 | + - Comparison with other Python libraries |
| 313 | + - Memory profiling |
| 314 | + |
| 315 | +4. **Documentation**: Add `docs/api/` with: |
| 316 | + - Sphinx-generated API docs |
| 317 | + - Architecture diagrams |
| 318 | + - Performance tuning guide |
| 319 | + |
| 320 | +## Contributing |
| 321 | + |
| 322 | +When adding new features, follow the separation of concerns: |
| 323 | + |
| 324 | +1. **Core algorithm changes**: Modify `include/prtree/core/prtree.h` |
| 325 | +2. **Expose to Python**: Update `src/cpp/bindings/python_bindings.cc` |
| 326 | +3. **Python API enhancements**: Update `src/python_prtree/core.py` |
| 327 | +4. **Add tests**: Unit tests for features, integration tests for workflows |
| 328 | + |
| 329 | +See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed contribution guidelines. |
| 330 | + |
| 331 | +## References |
| 332 | + |
| 333 | +- **Priority R-Tree Paper**: Arge et al., SIGMOD 2004 |
| 334 | +- **pybind11**: https://pybind11.readthedocs.io/ |
| 335 | +- **Python Packaging**: PEP 517, PEP 518, PEP 621 |
0 commit comments