pymupdf.Page.replace_image() will save an image stream file 2 times #4328
Replies: 5 comments
-
I don't understand what you are actually doing: please provide your PDF and your script. One intermediate comment though: |
Beta Was this translation helpful? Give feedback.
-
I want to replace the original image in the pdf file with a smaller image, my program extracts each image in the file, then uses a compression function to reduce their size, and then replaces the original image. But I found that for example, if I delete all the pictures directly, the resulting file size is 1MB, and if I replace the original picture with a total size of 2MB, I get a file with a size of 5MB. I think the reasonable size is 3MB.This means that these pictures store two copies of the same data in a pdf file.Thank you for your reply
|
Beta Was this translation helpful? Give feedback.
-
I see no issue here, but rather a request for help. Moving this to "Discussions". |
Beta Was this translation helpful? Give feedback.
-
There is no information what that image compression function is. The only thing you can do is accepting a loss of image quality: for example use grayscale or change resolution. |
Beta Was this translation helpful? Give feedback.
-
I wrote a test program to prove what I said and you can run it to give it a try. If I go the other way, delete the image and then insert the compressed image, the file size will be correct.
The result I get is this:
Or the other way around: every time you replace it with the exact same image, you'll notice that it's size getting bigger and bigger
The result I get is this:
When i is greater than four, my computer cannot continue to execute the program I think it's the doc.xref_copy(new_xref, xref) statement in the replace_image() function that causes this,I think it's the doc.xref_copy(new_xref, xref) statement in the replace_image() function that causes this, adding a stream of the new image to the pdf, which is then copied back to the original image xref
If I add page.delete_image (new_xref) after doc.xref_copy(new_xref, xref), I can fix this bug Hopefully you can try to run my code, just need to change the path of the pdf file of the test, thanks |
Beta Was this translation helpful? Give feedback.
-
Description of the bug
new_xref = page.insert_image( page.rect, filename=filename, stream=stream, pixmap=pixmap ) doc.xref_copy(new_xref, xref) # copy over new to old
the code
doc.xref_copy(new_xref, xref)
copy the stream of new_image to the old xref,but it do not delete the stream of new_image. this bug will result in the old xref and the new_xref Point to the same data, and the same data is stored 2 times in the pdf file.I want to replace the original picture with a smaller one.However, I found that the file size after replacement is more than the file size obtained by directly deleting the image plus the volume of the replaced image, which is exactly equal to the size of the image file used to replace, so I guess the file is actually stored twice.I looked at the source code and came to the conclusion above.
I add
page.delete_image(new_xref)
after it can Fix this,but I think it has batter solution.
How to reproduce the bug
use pymupdf.Page.replace_image to change image size,then look the change of size of pdf file.
sorry my English is bad.
PyMuPDF version
1.25.2
Operating system
Windows
Python version
3.9
Beta Was this translation helpful? Give feedback.
All reactions