Releases: DataFog/datafog-python
Releases · DataFog/datafog-python
🚧 Beta Release 4.3.0b1
Beta Release Notes
Beta Release: 2025-06-05
🚀 New Features
- fix(ci): add diagnostics and plugin verification for benchmark tests
- fix(ci): add diagnostics and plugin verification for benchmark tests
- Merge pull request #104 from DataFog/feature/sample-notebooks
- Merge branch 'dev' into feature/sample-notebooks
- Fix segmentation fault in beta-release workflow and add sample notebook
- Merge pull request #103 from DataFog/feature/sample-notebooks
- Fix segmentation fault in beta-release workflow and add sample notebook
- Merge pull request #102 from DataFog/feature/gliner-integration-v420
- Merge branch 'dev' into feature/gliner-integration-v420
- Merge branch 'feature/gliner-integration-v420' of github.com:DataFog/datafog-python into feature/gliner-integration-v420
- Merge pull request #101 from DataFog/feature/gliner-integration-v420
- Merge branch 'dev' into feature/gliner-integration-v420
- Merge pull request #100 from DataFog/feature/gliner-integration-v420
- docs: add release guidelines to Claude.md
🐛 Bug Fixes
- Merge pull request #108 from DataFog/fix/beta-workflow-changelog-v2
- Merge branch 'dev' into fix/beta-workflow-changelog-v2
- Merge pull request #107 from DataFog/fix/beta-workflow-changelog-v2
- Merge branch 'fix/performance-regression' into dev
- fix(ci): improve beta version detection to check existing git tags
- Merge branch 'fix/performance-regression' of github.com:DataFog/datafog-python into fix/performance-regression
- fix(ci): improve beta versioning logic and use GH_PAT token
- fix(ci): improve beta versioning logic and use GH_PAT token
- fix(ci): replace invalid --benchmark-skip flag with simple performance test
- Merge pull request #106 from DataFog/fix/performance-regression
- Merge branch 'dev' into fix/performance-regression
- Merge pull request #105 from DataFog/fix/performance-regression
- fix(ci): reset benchmark baseline to resolve false regression alerts
- fix(performance): eliminate memory debugging overhead from benchmarks
- fix(performance): eliminate redundant regex calls in structured output mode
- fix(performance): eliminate redundant regex calls in structured output mode
- fix(ci): handle segfault gracefully while preserving test validation
- fix(tests): make spaCy address detection test more robust
- fix(ci): improve GLiNER validation to confirm PyTorch exclusion
- fix(ci): exclude PyTorch dependencies entirely to prevent segfault
- fix(ci): eliminate PyTorch segfaults and enhance README with GLiNER examples
- fix(ci): workaround for PyTorch segfault in CI environments
- fix(ci): split test execution to prevent memory segfault
- fix(ci): reduce coverage reporting to prevent segmentation fault
- fix(tests): resolve final GLiNER test failures
- fix(tests): update GLiNER test mocking for proper import paths
- fix(tests): resolve GLiNER dependency mocking for CI environments
🔧 Other Changes
- chore: bump version to 4.3.0 for next development cycle
- chore: clean up test changelog file after merge
- chore: clean up test changelog file after merge
- chore: set version to 4.2.0b1 for beta testing of unreleased 4.2.0
- resolve: merge conflicts with enhanced segfault detection
📥 Installation
# Core package (lightweight)
pip install datafog
# With all features
pip install datafog[all]
📊 Metrics
- Package size: ~2MB (core)
- Install time: ~10 seconds
- Tests passing: ✅
- Commits this week: 46
🚧 Beta Release 4.2.0b3
Beta Release Notes
Beta Release: 2025-05-31
🚀 New Features
- fix(ci): add diagnostics and plugin verification for benchmark tests
- fix(ci): add diagnostics and plugin verification for benchmark tests
- Merge pull request #104 from DataFog/feature/sample-notebooks
- Merge branch 'dev' into feature/sample-notebooks
- Fix segmentation fault in beta-release workflow and add sample notebook
- Merge pull request #103 from DataFog/feature/sample-notebooks
- Fix segmentation fault in beta-release workflow and add sample notebook
- Merge pull request #102 from DataFog/feature/gliner-integration-v420
- Merge branch 'dev' into feature/gliner-integration-v420
- Merge branch 'feature/gliner-integration-v420' of github.com:DataFog/datafog-python into feature/gliner-integration-v420
- Merge pull request #101 from DataFog/feature/gliner-integration-v420
- Merge branch 'dev' into feature/gliner-integration-v420
- Merge pull request #100 from DataFog/feature/gliner-integration-v420
- docs: add release guidelines to Claude.md
- feat(nlp): add GLiNER integration with smart cascading engine
- fix(deps): add pydantic-settings to cli and all extras
- Merge pull request #92 from DataFog/feature/automated-release-pipeline
- feat(ci): configure release workflows for 4.2.0 minor version bump
- feat(ci): add comprehensive alpha→beta→stable release cycle
- feat(ci): add nightly alpha builds for Monday-Thursday
- Merge pull request #91 from DataFog/feature/implement-weekly-release-plan
- feat(release): implement weekly release plan infrastructure
🐛 Bug Fixes
- Merge pull request #107 from DataFog/fix/beta-workflow-changelog-v2
- Merge branch 'fix/performance-regression' into dev
- fix(ci): improve beta version detection to check existing git tags
- Merge branch 'fix/performance-regression' of github.com:DataFog/datafog-python into fix/performance-regression
- fix(ci): improve beta versioning logic and use GH_PAT token
- fix(ci): improve beta versioning logic and use GH_PAT token
- fix(ci): replace invalid --benchmark-skip flag with simple performance test
- Merge pull request #106 from DataFog/fix/performance-regression
- Merge branch 'dev' into fix/performance-regression
- Merge pull request #105 from DataFog/fix/performance-regression
- fix(ci): reset benchmark baseline to resolve false regression alerts
- fix(performance): eliminate memory debugging overhead from benchmarks
- fix(performance): eliminate redundant regex calls in structured output mode
- fix(performance): eliminate redundant regex calls in structured output mode
- fix(ci): handle segfault gracefully while preserving test validation
- fix(tests): make spaCy address detection test more robust
- fix(ci): improve GLiNER validation to confirm PyTorch exclusion
- fix(ci): exclude PyTorch dependencies entirely to prevent segfault
- fix(ci): eliminate PyTorch segfaults and enhance README with GLiNER examples
- fix(ci): workaround for PyTorch segfault in CI environments
- fix(ci): split test execution to prevent memory segfault
- fix(ci): reduce coverage reporting to prevent segmentation fault
- fix(tests): resolve final GLiNER test failures
- fix(tests): update GLiNER test mocking for proper import paths
- fix(tests): resolve GLiNER dependency mocking for CI environments
- Merge pull request #99 from DataFog/fix/github-actions-workflow-fixes
- Merge branch 'dev' into fix/github-actions-workflow-fixes
- fix(deps): move pydantic-settings to core dependencies
- fix(ci): install all extras and configure pytest-asyncio in workflows
- Merge pull request #98 from DataFog/fix/github-actions-workflow-fixes
- fix(ci): resolve YAML syntax errors in GitHub Actions workflows
- Merge pull request #96 from DataFog/codex/fix-failing-github-actions-in-workflows
- fix release workflows
- Merge pull request #95 from DataFog/hotfix/readme-fix
- Merge branch 'dev' into hotfix/readme-fix
- fix(ci): remove indentation from Python code in workflow commands
- fix(text): resolve missing Span import for structured output
- fix(ci): resolve YAML syntax issues in workflow files
- fix(ci): resolve prettier pre-commit hook configuration
- fix(ci): resolve YAML syntax issues in release workflows
- fix(lint): resolve flake8 string formatting warnings
- fix(ci): restore expected job names and consolidate workflows
- fix(imports): resolve flake8 E402 import order issues
📚 Documentation
- docs: streamline Claude.md development guide for v4.2.0
- fixed readme
🔧 Other Changes
- chore: clean up test changelog file after merge
- chore: set version to 4.2.0b1 for beta testing of unreleased 4.2.0
- resolve: merge conflicts with enhanced segfault detection
- release: prepare v4.2.0 with GLiNER integration
- updated workflows
- Merge pull request #94 from DataFog/hotfix/beta-workflow-yaml-syntax
- Merge branch 'dev' into hotfix/beta-workflow-yaml-syntax
- Merge pull request #93 from DataFog/hotfix/beta-workflow-yaml-syntax
📥 Installation
# Core package (lightweight)
pip install datafog
# With all features
pip install datafog[all]
📊 Metrics
- Package size: ~2MB (core)
- Install time: ~10 seconds
- Tests passing: ✅
- Commits this week: 75
release: prepare v4.2.0 with GLiNER integration
DataFog 4.2.0 - GLiNER Integration Release
Released: 2025-05-30
🚀 Major Features
GLiNER Integration
- Modern NER Engine: Added GLiNER (Generalist Named Entity Recognition) support
- Smart Cascading: Intelligent progression from regex → GLiNER → spaCy
- 32x Performance: GLiNER provides 32x faster NER compared to spaCy baseline
- PII-Specialized Models: Support for
urchade/gliner_multi_pii-v1
and other models
Engine Selection
from datafog.services.text_service import TextService
# New GLiNER engine
service = TextService(engine="gliner")
# Smart cascading (recommended)
service = TextService(engine="smart") # regex → GLiNER → spaCy
Performance Improvements
- 190x faster regex engine for structured PII (emails, phones, SSNs)
- Lightweight core: <2MB package with optional ML extras
- Memory optimization: Enhanced segfault handling and performance validation
🐛 Bug Fixes
- Fixed CI segmentation faults in test environments
- Resolved benchmark regression detection
- Improved dependency management for optional ML features
- Enhanced test stability across platforms
🔧 Infrastructure
- Comprehensive CI/CD improvements
- Enhanced GitHub Actions workflows
- Better error handling and diagnostics
- Sample notebooks and examples
📥 Installation
# Core package (lightweight)
pip install datafog
# With GLiNER support
pip install datafog[nlp-advanced]
# Everything included
pip install datafog[all]
📊 Performance Comparison
Engine | Speed vs spaCy | Accuracy | Use Case |
---|---|---|---|
regex |
190x faster | High (structured) | Emails, phones, SSNs |
gliner |
32x faster | Very High | Modern NER |
spacy |
1x (baseline) | Good | Traditional NLP |
smart |
60x faster | Highest | Best balance |
🔗 Links
- PyPI: https://pypi.org/project/datafog/4.2.0/
- Documentation: https://docs.datafog.ai
- GitHub: https://github.com/datafog/datafog-python
v4.1.1
v4.1.0
What's Changed
- Update about.py by @sidmohan0 in #59
- feat(regex): Enhance regex patterns and tests for PII detection by @sidmohan0 in #65
- feat(text-service): Add engine selection and structured output by @sidmohan0 in #66
- Feat/benchmarks by @sidmohan0 in #67
- Run benchmarks on pushes and pull requests by @sidmohan0 in #68
- Feat/regex fallback by @sidmohan0 in #69
- Update setup.py by @sidmohan0 in #70
- runtime breakers by @sidmohan0 in #72
- Added integration test markers: by @sidmohan0 in #73
- Feat/cli smoke tests by @sidmohan0 in #74
- Feat/ocr flag by @sidmohan0 in #75
- Chore/housekeeping by @sidmohan0 in #76
- updated textservice to not catch empty strings by @sidmohan0 in #77
- Update setup.py by @sidmohan0 in #78
- fixed tagging issue by @sidmohan0 in #79
- fixed pypi by @sidmohan0 in #81
- reverting pypi to stable ver by @sidmohan0 in #82
- Cleanup/fix benchmark and ci issues by @sidmohan0 in #87
- refactor: replace speed claims with intelligent engine selection messaging by @sidmohan0 in #86
- fixed pypi.yml by @sidmohan0 in #88
- removed auto beta by @sidmohan0 in #89
Full Changelog: v4.0.0...v4.1.0
v4.0.0
What's Changed
- V4.0.0 by @sidmohan0 in #52
- Update setup.py by @sidmohan0 in #53
Full Changelog: v3.4.0...v4.0.0
v3.4.0
What's Changed
- fix link to the getting started collab notebook by @sroy9675 in #44
- remove extraneous debug prints by @sroy9675 in #45
- ff by @sidmohan0 in #46
- Feature/update example notebooks by @sidmohan0 in #47
- cleanup by @sidmohan0 in #48
- _chunk_text + tests by @sidmohan0 in #49
- Remove all .venv files and example.venv files and error_log.txt ... by @pselvana in #50
- python 3.10, 3.11, 3.12 support | model by @sidmohan0 in #51
New Contributors
Full Changelog: v3.3.0...v3.4.0
v3.3.0
What's Changed
- Recovered 3.2.2 by @sidmohan0 in #33
- Temp update cicd by @sidmohan0 in #35
- pre-commit passed by @sidmohan0 in #36
- updated ymls by @sidmohan0 in #37
- Add Synchronous processing by @sroy9675 in #32
- publish-pypi.yml by @sidmohan0 in #40
- Hotfix/spacy pypi issue by @sidmohan0 in #41
- updated publish-pypi to remove en-spacy-pii-fast install by @sidmohan0 in #42
New Contributors
Full Changelog: v3.2.1...v3.3.0
v.3.2.1
v3.2.0: Improved OCR, streamlined functions, and more
First - thanks everyone for bearing with us as we've made some notable architectural changes over the past several releases.
A big part of doing this was orienting the package towards better long-term development and where DataFog is being used today and likely in the future within API services.
- Implement Pytesseract: significant speed and accuracy in OCR text extraction from Donut!
- Allows for better image and PDF extraction
- Enhanced test suite coverage
- Refactored definitions to support async (for API integration)
- Refactored classes/functions around ImageService, TextService, SparkService