Skip to content

regression since itext 2.1.7: indexed images not detected as indexed, sometimes inflating size #1289

@DRoppelt

Description

@DRoppelt

Describe the bug

GIFs that get embedded into a PDF, that are encoded as indexed images, inflate the size of the PDFs to a significant factor compared to when using itext 2.1.7

We found that Image.getInstance(byte[]) does not detect images as indexed and somewhere later in the rendering creates a larger image than if it was indexed.

The sample shows a PDF with an embedded indexed gif (originally 22kB) came out to 18kB with itext 2.1.7, and as 48kB in openpdf (+>100% increase)

File itext 2.1.7 openpdf 2.0.3
gimp-indexed.png (53kB) (8bit color) 54kB 48kB
gimp-rgb.png (128kB) (24bit color) 126kB 48kB
gimp-indexed.gif (22kB) (2bit color) 18kB 48kB

As a sidenote, it is great to see that openpdf applies huge gains in regular rgb PNGs (126kB -> 48kB)

To Reproduce

I attached a sample maven project openpdf-indexed-images-inflated.zip

  1. using itext 2.1.7
[main] INFO org.example.ImageTest - START
[main] INFO org.example.ImageTest - gimp-indexed.png (53kb (53625b)) writing into out.png.indexed.pdf
[main] INFO org.example.ImageTest - Writing gimp-indexed.png into out.png.indexed.pdf (size is 53625b)
[main] INFO org.example.ImageTest - img colorspace is indexed
[main] INFO org.example.ImageTest - doc is 54kb (54792b)
[main] INFO org.example.ImageTest - START
[main] INFO org.example.ImageTest - gimp-rgb.png (125kb (125783b)) writing into out.png.rgb.pdf
[main] INFO org.example.ImageTest - Writing gimp-rgb.png into out.png.rgb.pdf (size is 125783b)
[main] INFO org.example.ImageTest - img colorspace is rgb
[main] INFO org.example.ImageTest - doc is 126kb (126842b)
[main] INFO org.example.ImageTest - START
[main] INFO org.example.ImageTest - gimp-indexed.gif (22kb (22431b)) writing into out.gif.indexed.pdf
[main] INFO org.example.ImageTest - Writing gimp-indexed.gif into out.gif.indexed.pdf (size is 22431b)
[main] INFO org.example.ImageTest - img colorspace is indexed
[main] INFO org.example.ImageTest - doc is 18kb (18263b)
  1. using openpdf 2.0.3

note that img.getColorspace() == 3 is always the case (==detected as rgb, never as indexed)

[main] INFO org.example.ImageTest - START
[main] INFO org.example.ImageTest - gimp-indexed.png (53kb (53625b)) writing into out.png.indexed.pdf
[main] INFO org.example.ImageTest - Writing gimp-indexed.png into out.png.indexed.pdf (size is 53625b)
[main] INFO org.example.ImageTest - img colorspace is rgb
[main] INFO org.example.ImageTest - doc is 48kb (48177b)
[main] INFO org.example.ImageTest - START
[main] INFO org.example.ImageTest - gimp-rgb.png (125kb (125783b)) writing into out.png.rgb.pdf
[main] INFO org.example.ImageTest - Writing gimp-rgb.png into out.png.rgb.pdf (size is 125783b)
[main] INFO org.example.ImageTest - img colorspace is rgb
[main] INFO org.example.ImageTest - doc is 48kb (48177b)
[main] INFO org.example.ImageTest - START
[main] INFO org.example.ImageTest - gimp-indexed.gif (22kb (22431b)) writing into out.gif.indexed.pdf
[main] INFO org.example.ImageTest - Writing gimp-indexed.gif into out.gif.indexed.pdf (size is 22431b)
[main] INFO org.example.ImageTest - img colorspace is rgb
[main] INFO org.example.ImageTest - doc is 48kb (48177b)
public class ImageTest {
    private static final Logger log = LoggerFactory.getLogger(ImageTest.class);


    @ParameterizedTest
    @CsvSource({
            "gimp-indexed.png,png,out.png.indexed.pdf"
            ,"gimp-rgb.png,png,out.png.rgb.pdf"
            ,"gimp-indexed.gif,gif,out.gif.indexed.pdf"
    })
    public void testApp(final String picIn, String fileType, final String fileOut) throws Exception {

        var fileIn = Files.readAllBytes(Paths.get(picIn));
        log.info("START");
        log.info("{} ({}kb ({}b)) writing into {}", picIn, fileIn.length/1000, fileIn.length, fileOut);
        Document document = new Document();
        var fos = new FileOutputStream(fileOut);
        final PdfWriter instance = PdfWriter.getInstance(document, fos);

        document.open();
        instance.getInfo().put(PdfName.CREATOR, new PdfString(Document.getVersion()));
        document.add(new Paragraph("Hello World"));

        log.info("Writing {} into {} (size is {}b)", picIn, fileOut, fileIn.length);
        var img = Image.getInstance(fileIn);
        img.scaleToFit(300, 300);
        log.info("img colorspace is {}", img.getColorspace() == 3 ? "rgb" : img.getColorspace() == 1 ? "indexed": "unknown");
        document.add(img);
        document.close();


        File file = new File(fileOut);
        log.info("doc is {}kb ({}b)",file.length()/1000, file.length());

    }
}

Expected behavior

Images that have indexing should be detected as indexed and the resulting PDF should be closer to the 18kB (itext) than currently with 48kB (openpdf). At least barely larger than the original git at 22kB

Screenshots

NA

System

(please complete the following information)

  • OS: MacOS, Windows, Linux Based Ubuntu containers
  • Used font: NA
  • OpenPDF version: 2.0.3

Your real name

Dennis Roppelt

Additional context

We have migrated from flyingsaucer before 9.4.0 to the latest 9.11.6. The team has migrated to openpdf 2.0.3 away from itext 2.1.7 since 9.4.0 of their project. We are using the saucer and by luck found that some PDFs have inflated 30-50% in size and were investigating, eventually landing here. The reproduceable sample creates inflation over 100%, but in somewhat real docs it is more like 30ish%

When I talk about "indexed images" I mean that technique https://en.wikipedia.org/wiki/Indexed_color

In computing, indexed color is a technique to manage digital images' colors in a limited fashion, in order to save computer memory and file storage, while speeding up display refresh and file transfers. It is a form of vector quantization compression.

how I produced the images

Images are exported from Gimp

A4 drawing size

Draw some rectangles in different colors

Indexed: Image -> Mode -> Indexed... ->

  • tick "use web-optimized..."
  • tick "remove unused and duplicate..."

RGB: Image -> Mode -> RGB

Export PNGs:
both on compression level = 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions