Skip to content

Feat/regex fallback #69

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 3, 2025
Merged

Feat/regex fallback #69

merged 16 commits into from
May 3, 2025

Conversation

sidmohan0
Copy link
Contributor

No description provided.

sidmohan0 and others added 10 commits May 1, 2025 20:07
- Improve regex patterns for IP addresses, credit cards, and phone numbers
- Refactor tests using parameterization for better maintainability
- Add comprehensive test cases for edge cases and invalid formats
- Fix validation issues with IPv6 addresses and credit card formats
- Document regex pattern logic with clear comments
feat(regex): Enhance regex patterns and tests for PII detection
- Add engine parameter to TextService allowing 'regex', 'spacy', or 'auto' modes
- Implement auto-fallback mechanism that tries regex first, falls back to spaCy
- Add structured output option returning Span objects with position information
- Create comprehensive integration tests for the new features
- Update documentation in code comments, README, and CHANGELOG
feat(text-service): Add engine selection and structured output
Run weekly scheduled benchmarks
Compare results against previous runs
Alert on performance regressions (>10% slower)
Run weekly scheduled benchmarks
Compare results against previous runs
Alert on performance regressions (>10% slower)
Run benchmarks on pushes and pull requests
@sidmohan0 sidmohan0 added this to the 4.1.0 milestone May 3, 2025
@sidmohan0 sidmohan0 self-assigned this May 3, 2025
sidmohan0 added 6 commits May 2, 2025 18:54
Created a GitHub Actions workflow file (.github/workflows/wheel_size.yml) to check wheel size
Implemented a script (scripts/check_wheel_size.py) to verify wheel size is under 8 MB
Verified locally that the wheel size is only 0.05 MB, well under our 8 MB limit
Type Hints Completion
Fixed all type annotation issues in the TextService class
Added proper type annotations for variables and function parameters
Added type checking for returned values from functions
Improved code quality with better variable naming and defensive programming Documentation Improvements
Added When do I need spaCy? guidance to the README
Listed specific scenarios where spaCy would be beneficial despite being slower
Emphasized the significant performance advantage of the regex engine
CHANGELOG Updates
Updated the changelog to reflect all the changes for version 4.1.0
Changed version from 4.1.0-dev to 4.1.0
Added entries for all the new features and improvements
…mentation:

Set the end position of the span to match the actual length of the text (len(test_text))
Removed the PHONE span from the mock since it wasn't part of the input text
Updated the test assertions to expect only one span (EMAIL) instead of two
Made the test more explicit by verifying the exact properties of the span
@sidmohan0 sidmohan0 merged commit caf8c06 into dev May 3, 2025
5 checks passed
@sidmohan0 sidmohan0 deleted the feat/regex-fallback branch May 3, 2025 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant