|
1 | 1 | # ChangeLog
|
2 | 2 |
|
| 3 | +## [2025-05-29] |
| 4 | + |
| 5 | +### `datafog-python` [4.2.0] |
| 6 | + |
| 7 | +#### Major Features |
| 8 | + |
| 9 | +- **GLiNER Integration**: Added modern Named Entity Recognition engine with GLiNER (Generalist Model for NER) |
| 10 | + - New `gliner` engine option in TextService providing 32x performance improvement over spaCy |
| 11 | + - PII-specialized model support (`urchade/gliner_multi_pii-v1`) for enhanced accuracy |
| 12 | + - Custom entity type configuration for domain-specific detection |
| 13 | + - Automatic model downloading and caching functionality |
| 14 | + |
| 15 | +- **Smart Cascading Engine**: Introduced intelligent multi-engine approach |
| 16 | + - New `smart` engine that progressively tries regex → GLiNER → spaCy |
| 17 | + - Configurable stopping criteria based on entity count thresholds |
| 18 | + - Optimized for best accuracy/performance balance (60x average speedup) |
| 19 | + |
| 20 | +- **Enhanced CLI Model Management**: Extended command-line interface |
| 21 | + - `--engine` flag support for `download-model` and `list-models` commands |
| 22 | + - GLiNER model discovery and management capabilities |
| 23 | + - Unified model management across spaCy and GLiNER engines |
| 24 | + |
| 25 | +#### Architecture Improvements |
| 26 | + |
| 27 | +- **Optional Dependencies**: Added new `nlp-advanced` extra for GLiNER dependencies |
| 28 | + - `pip install datafog[nlp-advanced]` for GLiNER + PyTorch + Transformers |
| 29 | + - Maintained lightweight core architecture (<2MB) |
| 30 | + - Graceful degradation when GLiNER dependencies unavailable |
| 31 | + |
| 32 | +- **Engine Ecosystem**: Expanded from 3 to 5 annotation engines |
| 33 | + - `regex`: 190x faster, structured PII detection (core only) |
| 34 | + - `gliner`: 32x faster, modern NER with custom entities |
| 35 | + - `spacy`: Traditional NLP, comprehensive entity recognition |
| 36 | + - `smart`: Cascading approach for optimal accuracy/speed |
| 37 | + - `auto`: Legacy regex→spaCy fallback |
| 38 | + |
| 39 | +#### Performance & Quality |
| 40 | + |
| 41 | +- **Validated Performance**: Comprehensive benchmarking across all engines |
| 42 | + - GLiNER: 32x faster than spaCy with superior NER accuracy |
| 43 | + - Smart cascading: 60x average speedup with highest accuracy scores |
| 44 | + - Regex: Maintained 190x performance advantage |
| 45 | + |
| 46 | +- **Comprehensive Testing**: Added 19 new test cases for GLiNER integration |
| 47 | + - Full coverage of GLiNER annotator functionality |
| 48 | + - Graceful degradation testing for missing dependencies |
| 49 | + - Smart cascading logic validation |
| 50 | + - Cross-engine integration testing |
| 51 | + |
| 52 | +#### Documentation & Developer Experience |
| 53 | + |
| 54 | +- **Updated Documentation**: Comprehensive guides and examples |
| 55 | + - README performance comparison table with all 5 engines |
| 56 | + - Engine selection guidance with use case recommendations |
| 57 | + - GLiNER model management and CLI usage examples |
| 58 | + - Installation options for different dependency combinations |
| 59 | + |
| 60 | +- **Developer Guide**: Streamlined development documentation |
| 61 | + - Updated architecture overview with GLiNER integration |
| 62 | + - Performance requirements and testing strategies |
| 63 | + - Common development patterns and best practices |
| 64 | + |
| 65 | +#### Breaking Changes |
| 66 | + |
| 67 | +- **Engine Options**: New engine types added to TextService |
| 68 | + - Existing code using `engine="auto"` continues to work unchanged |
| 69 | + - New engines `gliner` and `smart` require `[nlp-advanced]` extra |
| 70 | + |
| 71 | +#### Dependencies |
| 72 | + |
| 73 | +- **New Optional Dependencies** (nlp-advanced extra): |
| 74 | + - `gliner>=0.2.5` |
| 75 | + - `torch>=2.1.0,<2.7` |
| 76 | + - `transformers>=4.20.0` |
| 77 | + - `huggingface-hub>=0.16.0` |
| 78 | + |
| 79 | +#### Migration Guide |
| 80 | + |
| 81 | +For users upgrading from v4.1.1: |
| 82 | +- All existing functionality remains unchanged |
| 83 | +- To use GLiNER: `pip install datafog[nlp-advanced]` |
| 84 | +- Smart cascading: `TextService(engine="smart")` for best balance |
| 85 | +- CLI: Use `--engine gliner` flag for GLiNER model management |
| 86 | + |
3 | 87 | ## [2025-05-05]
|
4 | 88 |
|
5 | 89 | ### `datafog-python` [4.1.1]
|
|
0 commit comments