Skip to content

fix: accept relative URIs in PdfHyperlink without validation failure#520

Merged
PeterStaar-IBM merged 1 commit into
docling-project:mainfrom
Ultizan:fix/pdf-hyperlink-relative-uri
Feb 23, 2026
Merged

fix: accept relative URIs in PdfHyperlink without validation failure#520
PeterStaar-IBM merged 1 commit into
docling-project:mainfrom
Ultizan:fix/pdf-hyperlink-relative-uri

Conversation

@Ultizan

@Ultizan Ultizan commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

PDF hyperlinks may contain relative paths, internal bookmarks, or fragment-only references that are not valid absolute URLs. The strict AnyUrl validation on PdfHyperlink.uri caused the entire page preprocess stage to fail when such URIs were encountered, resulting in empty documents and lost content.

Change uri type to Union[AnyUrl, str] with a field_validator that attempts AnyUrl parsing first (preserving structured metadata like scheme/host/path) and falls back to str for non-absolute URIs.

PDF hyperlinks may contain relative paths, internal bookmarks, or
fragment-only references that are not valid absolute URLs. The strict
AnyUrl validation on PdfHyperlink.uri caused the entire page preprocess
stage to fail when such URIs were encountered, resulting in empty
documents and lost content.

Change uri type to Union[AnyUrl, str] with a field_validator that
attempts AnyUrl parsing first (preserving structured metadata like
scheme/host/path) and falls back to str for non-absolute URIs.

Signed-off-by: Ultizan <ultizan@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @Ultizan, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

@dosubot

dosubot Bot commented Feb 18, 2026

Copy link
Copy Markdown

Related Documentation

Checked 17 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@codecov

codecov Bot commented Feb 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@dolfim-ibm dolfim-ibm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@PeterStaar-IBM PeterStaar-IBM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@PeterStaar-IBM PeterStaar-IBM merged commit 6032c7c into docling-project:main Feb 23, 2026
11 checks passed
Matteo-Omenetti pushed a commit that referenced this pull request Mar 11, 2026
…520)

PDF hyperlinks may contain relative paths, internal bookmarks, or
fragment-only references that are not valid absolute URLs. The strict
AnyUrl validation on PdfHyperlink.uri caused the entire page preprocess
stage to fail when such URIs were encountered, resulting in empty
documents and lost content.

Change uri type to Union[AnyUrl, str] with a field_validator that
attempts AnyUrl parsing first (preserving structured metadata like
scheme/host/path) and falls back to str for non-absolute URIs.

Signed-off-by: Ultizan <ultizan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants