Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded links inside the table are not extracted #42

Open
narsandu opened this issue Jun 13, 2024 · 1 comment
Open

Embedded links inside the table are not extracted #42

narsandu opened this issue Jun 13, 2024 · 1 comment
Labels
enhancement New feature or request postponed

Comments

@narsandu
Copy link

When extracting data from a PDF table with embedded links, only the text is captured, not the actual links.

@narsandu narsandu changed the title Embedded links in the pdf table are not extracted Embedded links inside the table are not extracted Jun 13, 2024
@JorjMcKie
Copy link
Contributor

This is not a bug!
It is a feature that may eventually be implemented sometime later.
It would have to be implemented in the table module in PyMuPDF. This makes it complicated because the actual link text and the display text would both have to be taken into account. Probably, a reasonable decision would be to fall back to HTML syntax for doing this ...

There is a similar request #21 for doing the same with images, maybe you to take a look.

@JorjMcKie JorjMcKie added enhancement New feature or request postponed labels Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request postponed
Projects
None yet
Development

No branches or pull requests

2 participants