-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pdf pages with transparent images return black image in html #657
Comments
Please try this conversion using the CLI tool of MuPDF, |
Ok, I will try this and update here. |
I tried it already. I can confirm, that it is an upstream bug. >>> doc.getPageImageList(0)
[(5, 6, 1265, 1303, 8, 'DeviceRGB', '', 'Image5', 'FlateDecode')] # an image (5) with a mask(6)
>>> img1=doc.extractImage(5) # read the base image
>>> img1["ext"]
'png'
>>> img2=doc.extractImage(6) # read the mask image
>>> pix=fitz.Pixmap(img1["image"]) # make pixmap of base image
>>> pix # already has a transparency channel:
Pixmap(DeviceRGB, IRect(0, 0, 1265, 1303), 1)
>>> pix.writeImage("not-transparent.png") # but the alpha values are all intransparent
>>> mask = fitz.Pixmap(img2["image"]) # pixmap of mask image
>>> mask
Pixmap(DeviceGray, IRect(0, 0, 1265, 1303), 0)
>>> pix.setAlpha(mask.samples) # take its samples as alpha
>>> pix.writeImage("but-now.png") # this is the original image! |
@JorjMcKie Thank you ! I was not able to install mutool due to system restrictions, Could you please report this to ghostscript? `pixmap = page.getPixmap(alpha = False) pixmap.writePNG("page-%i.png" % page.number) # returns transparent image of the page` Also as workaround I am planning to write a method to generate html, Is there a better quickfix? Thanks, |
Your page image is as it should be - as it is shown by a PDF viewer, too. But I though you absolutely want to have page copy, which shos correctly in a browser? And only that is the problem. If you know how to integrate an SVG image in your HTML code, you can use the SVG image of the PDF page like so: svg = page.getSVGimage()
out = open("page-%i.svg" % page.number, "w")
out.write(svg)
out.close() This image is correctly rendered. |
I am not working with pdf pages as images, rather getting the pages as html is my goal. So yes I want the page shown correctly in the browser as html. So I believe converting/integrating svg image of the page is not an option. Thanks! |
But you can show an SVG in a browser! Waiting for MuPDF to fix the bug will not get you to your goal any time within the next months. |
If you
<!DOCTYPE html>
<html>
<body>
<div id="page-n" style="position:relative;width:595pt;height:841pt;background-color:white">
<img style="position:absolute;top:0pt;left:0pt;width:595pt;height:841pt" src="page-n.svg">
</div>
</body>
</html> ... then the page will hopefuly show correctly in a browser. |
This HTML code is also sufficient: <!DOCTYPE html>
<html>
<body>
<div id="page-n" style="background-color:white">
<img src="page-n.svg">
</div>
</body>
</html> Browsers in general also support the compressed SVGZ format (some more specification in the HTML code needed). So you may consider to output your svg images GZIP compressed. This will reduce their file size by more than 50%. |
@JorjMcKie Thanks much the update! I will have to look into this. There are some manipulations in the pdf like creating annotations and highlighting text before converting them to html. Need to check if these are retained as well. Also the text needs to be selectable. Will verify the same. |
This will not work with the approach. The rest will. |
@rianspeed - any reaction from MuPDF yet? |
Description:
Pdf document pages contains transparent png image. When trying to convert this document to Html the transparency is lost and the transparent part becomes black. Having this in black color makes the text on top become unreadable( if black font)
The reason being the transparent png is being converted to jpeg hence losing the alpha channel.
Code to reproduce:
`doc = fitz.open(document)
with open("sample_html.html", 'w', encoding="utf-8") as outfile:
doc.close()`
Configuration :
Version date: 2020-06-20 07:00:22.
Built for Python 3.7 on win32 (64-bit).
Attaching sample pdf and html output.
Test_doc_for_pdf_transparent_image.pdf
sample_html.zip
The text was updated successfully, but these errors were encountered: