Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpaCy Processor for Enhanced NLP Support in Quivr #3467

Closed
wants to merge 1 commit into from

Conversation

Sahil-2101
Copy link

Description

Key Features:

  • Introduced SpaCyProcessor to handle various file formats (PDF, DOCX, TXT, CSV)
  • Supports recursive text splitting for chunked processing
  • Applies spaCy NLP pipeline for tokenization and entity recognition on file content

Motivation:
Adding spaCy provides Quivr with advanced NLP features that will improve downstream tasks requiring text analysis, making the processing of varied document types more powerful.

Note:
This feature requires spaCy and a compatible language model (e.g., en_core_web_sm).

Checklist before requesting a review

Please delete options that are not relevant.

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

Screenshots (if appropriate):

- Introduced SpaCyProcessor to handle various file formats (PDF, DOCX, TXT, CSV)
- Supports recursive text splitting for chunked processing
- Applies spaCy NLP pipeline for tokenization and entity recognition on file content
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 12, 2024
@Sahil-2101 Sahil-2101 closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant