Skip to content

fix: export pdf, check existing conversions #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 29, 2025
Merged

Conversation

ceberam
Copy link
Contributor

@ceberam ceberam commented Apr 29, 2025

This PR addresses some fixes on the docling-s3in-s3out pipeline.

  • Improve use of temporary files by creating temporary folder at top
  • Fixes the storage of the original pdf files from the source to the target s3
  • Fixes the check of already converted documents in target s3.

It partially covers #25 , since a feature from that PR (parquet files) needs changes.

Resolves #20
Resolves #22

@ceberam ceberam requested review from dolfim-ibm and vku-ibm April 29, 2025 13:11
Copy link

mergify bot commented Apr 29, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Signed-off-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
Co-authored-by: Cesar Berrospi Ramis <75900930+ceberam@users.noreply.github.com>
@ceberam ceberam merged commit 3e55ce9 into main Apr 29, 2025
6 checks passed
@ceberam ceberam deleted the fix/export-pdf branch April 29, 2025 14:18
Copy link

codecov bot commented Apr 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DoclingConvert in s3_helper does not copy the original pdf file Method check_target_has_source_converted not working as designed
4 participants