Add SpaCy Processor for Enhanced NLP Support in Quivr #3467

Sahil-2101 · 2024-11-12T03:22:38Z

Description

Key Features:

Introduced SpaCyProcessor to handle various file formats (PDF, DOCX, TXT, CSV)
Supports recursive text splitting for chunked processing
Applies spaCy NLP pipeline for tokenization and entity recognition on file content

Motivation:
Adding spaCy provides Quivr with advanced NLP features that will improve downstream tasks requiring text analysis, making the processing of varied document types more powerful.

Note:
This feature requires spaCy and a compatible language model (e.g., en_core_web_sm).

Checklist before requesting a review

Please delete options that are not relevant.

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented hard-to-understand areas
New and existing unit tests pass locally with my changes
Any dependent changes have been merged

Screenshots (if appropriate):

- Introduced SpaCyProcessor to handle various file formats (PDF, DOCX, TXT, CSV) - Supports recursive text splitting for chunked processing - Applies spaCy NLP pipeline for tokenization and entity recognition on file content

add SpaCy processor for NLP text analysis

32a82bd

- Introduced SpaCyProcessor to handle various file formats (PDF, DOCX, TXT, CSV) - Supports recursive text splitting for chunked processing - Applies spaCy NLP pipeline for tokenization and entity recognition on file content

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 12, 2024

Sahil-2101 closed this Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SpaCy Processor for Enhanced NLP Support in Quivr #3467

Add SpaCy Processor for Enhanced NLP Support in Quivr #3467

Sahil-2101 commented Nov 12, 2024

Add SpaCy Processor for Enhanced NLP Support in Quivr #3467

Add SpaCy Processor for Enhanced NLP Support in Quivr #3467

Conversation

Sahil-2101 commented Nov 12, 2024

Description

Checklist before requesting a review

Screenshots (if appropriate):