Skip to content

feat(text-service): Add engine selection and structured output #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2025

Conversation

sidmohan0
Copy link
Contributor

Engine Selection and Structured Output for TextService

Changes

This PR implements the engine selection functionality for the TextService class as outlined in ticket #XX. It allows users to choose between different annotation engines while maintaining backward compatibility.

New Features

  • Engine Selection: Added an [engine] parameter to TextService that accepts:

    • "regex": Uses only RegexAnnotator (fastest, pattern-based)
    • "spacy": Uses only SpacyPIIAnnotator (more comprehensive)
    • "auto": Default mode that tries regex first and falls back to spaCy if no entities found
  • Structured Output: Added a structured parameter to annotation methods that returns a list of Span objects with:

    • label: Entity type (e.g., "EMAIL", "PERSON")
    • start: Character offset where entity begins
    • end: Character offset where entity ends
    • text: The actual text of the entity

Implementation Details

  • Modified all TextService methods to support both legacy dictionary output and new structured output
  • Ensured proper handling of text chunks with correct position adjustments
  • Added comprehensive test coverage for all new features
  • Updated documentation in code comments, README, and CHANGELOG

Testing

All tests pass, including the new integration tests specifically created for these features. The implementation maintains backward compatibility with existing code.

- Add engine parameter to TextService allowing 'regex', 'spacy', or 'auto' modes
- Implement auto-fallback mechanism that tries regex first, falls back to spaCy
- Add structured output option returning Span objects with position information
- Create comprehensive integration tests for the new features
- Update documentation in code comments, README, and CHANGELOG
@sidmohan0 sidmohan0 self-assigned this May 2, 2025
@sidmohan0 sidmohan0 added this to the 4.1.0 milestone May 2, 2025
@sidmohan0 sidmohan0 merged commit 8dd0053 into feat/regex-fallback May 2, 2025
@sidmohan0 sidmohan0 deleted the feat/text-service-engine-selection branch May 2, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant