Skip to content

Ignoring Tables in Find_Text Option #2908

Closed
@XariZaru

Description

@XariZaru

Is your feature request related to a problem? Please describe.
Ever since the find_tables method was added, I was wondering if there was a way to exclude these tables from the find_text function so that we could choose to include or exclude tables from the raw text.

Describe the solution you'd like
A method that grabs text while excluding table information.

Describe alternatives you've considered
I've tried to just use the find_text option in PymuPDF and then grabbing parsed tables from Adobe Extract API since the API seems to capture multi-cell tables better. The issue is that Adobe Extract API grabs raw text very poorly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions