-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector Image Drawings are not replicated correctly #1518
Comments
Here is another test with a different pdf page |
There is new pre-version wheel here if you would like to test the changes. |
@JorjMcKie Thanks for the prompt response, is there anyway we could ignore these exploitations like the shadings? so the result would be a bit presentable unlike the sample screenshot above in my seconds example? |
Btw, I just finished testing the pre-version 1.19.5 of PyMUPDF. it is working as expected. Thanks a lot :) |
Not reall, because they are interwoven with the rest. |
The other example you sent me (A500...) is misleading: if you remove text and images, the page looks exactly as produced by redrawing the paths. |
@JorjMcKie the end goal is actually to create an svg image(s) (based on the targeted Rect) without the raster images. I would try to use redaction to remove this images. Is there any way we can identify if there are any "unsupported" exploitations? that way, instead of re_drawing the vector drawings, if unsupported exploitations detected, I can opt to do the redactions instead. |
Yes, I actually realized this when I checked for images and text. on what I thought was a background fill, was actually a raster image. apologies for that. |
Bah, no problem at all.
Ok, then step 1 of your approach would always be removing text and raster images: >>> page.add_redact_annot(page.rect)
'Redact' annotation on page 0 of A500.pdf
>>> page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_REMOVE)
True
>>> page.clean_contents() Then identify the rectangle you need and set the page's CropBox to it.
# identify rectangle of interest
>>> r=page.cropbox
>>> r += (300, 300, -300, -300) # in our case just go away from borders somewhat
>>> out=open("reduced.svg", "w") # save the SVG file here
>>> page.set_cropbox(r) # reduce page to rectangle of interest
>>> page.cropbox # looks like this
Rect(300.0, 300.0, 1218.0, 714.0)
>>> out.write(page.get_svg_image()) # save SVG image
1018254
>>> out.close()
>>>
|
In contrast to pixmaps, SVG images do not support a clip parameter. |
Thanks for the update, I tried redacting the images and it looks a lot better than the previous approach. Is there anyway that Pymupdf can detect if such PyMUPDF features are ignored like the clippings and shading that you have mentioned?
|
This is difficult to answer in a definite way. There are various methods that extract releated information:
The last one may be interesting for you. For the example >>> bboxlog = page.get_bboxlog()
>>> pprint(bboxlog[:10])
[('fill-path',
(759.0924072265625, 621.6221313476562, 1421.7763671875, 936.5750732421875)),
('fill-path',
(893.4117431640625,
541.5783081054688,
1436.5706787109375,
936.5743408203125)),
('fill-path',
(759.0921020507812,
77.68048095703125,
1436.571044921875,
461.53448486328125)),
...
>>> box_types = set([b[0] for b in bboxlog])
>>> box_types
{'fill-image', 'stroke-path', 'fill-path', 'fill-shade'}
>>> Each item is a tuple |
@JorjMcKie thanks for the help. in the end, I went to something like these:
after comparing some of the pdf samples I had, I noticed that the only ones that are not reproducing correctly are the ones with the "fill-shade" key value so I used that as a common denominator wether to re-draw vectors or to redact images. |
Fixed in v1.19.5 currently being uploaded. |
Please provide all mandatory information!
Describe the bug (mandatory)
The object "TCA" in the pdf is not being replicated correctly.
To Reproduce (mandatory)
Using the examples in the pymupdf documentation, I attempt to replicated the vector drawings
Expected behavior (optional)
All Vector drawings are replicated
Screenshots (optional)
Sample Pdf Input:
Sample Pdf Output:
pdf files used for testing:
sample_pdf.zip
Your configuration (mandatory)
VERSION="20.04.3 LTS (Focal Fossa)"
--ID=ubuntu
--VERSION_ID="20.04"
Python 3.8.10
PyMuPDF 1.18.17
Installed via wheel
The text was updated successfully, but these errors were encountered: