Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSS-3207: review #179

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from
Open

DSS-3207: review #179

wants to merge 13 commits into from

Conversation

bsanchezb
Copy link
Contributor

This branch is a continuation of PR #177 with a PDF memory usage setting introduction.

Andrea Esposito and others added 12 commits September 19, 2024 13:55
# Conflicts:
#	dss-pades-pdfbox/src/main/java/eu/europa/esig/dss/pdf/pdfbox/PdfBoxDocumentReader.java
#	dss-pades-pdfbox/src/main/java/eu/europa/esig/dss/pdf/pdfbox/PdfBoxSignatureService.java
#	dss-pades-pdfbox/src/test/java/eu/europa/esig/dss/pades/signature/visible/AbstractTestVisualComparator.java
#	dss-pades-pdfbox/src/test/java/eu/europa/esig/dss/pades/signature/visible/DefaultVsNativeDrawerComparatorTest.java
Copy link

@roy20021 roy20021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @bsanchezb ,

I am very happy of the final result, thank you for taking care of it!

The extension of the PdfMemoryUsageSetting.Mode is reasonable.
I have left just a couple of trivial observations in the code.

Looking forward to see this improvement into the next release.

Regards,
Andrea

String.format("Encrypted document : %s", e.getMessage()));
MemoryUsageSetting memoryUsageSetting = PdfBoxUtils.getMemoryUsageSetting(pdfMemoryUsageSetting);
try {
if (PdfMemoryUsageSetting.Mode.MEMORY_FULL != pdfMemoryUsageSetting.getMode() && dssDocument instanceof FileDocument) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MEMORY_FULL could be used with a FileDocument (meaning that willingness is to fully load into memory the file) and I understand that you achieve that also with the stream and RandomAccessReadBuffer later in the code.
I just see it more complex and less linear, why not leave to the library? (that does the same anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not really the same. See the comparison between "Memory Full" and "Memory Buffered" in #177. There is a consistent difference in execution, mainly on validation and LTA-level extension.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I haven't caught that well.

I am surprised, the code is exactly the same for pdfbox:
loadPDF(File file, String password, StreamCacheCreateFunction streamCacheCreateFunction)
--> Loader.loadPDF(file, password, null, null, streamCacheCreateFunction)
--> https://github.com/apache/pdfbox/blob/357a2c14762f3e08f4ef48852d79f541ccaca6af/pdfbox/src/main/java/org/apache/pdfbox/Loader.java#L223

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it uses the IOUtils.createMemoryOnlyStreamCache(), but creates a RandomAccessReadBufferedFile(file) instead of converting the document to binaries directly. I think it makes the difference. Possible that the file in filesystem is being accessed multiple number of times? Especially provided that there is a huge difference in openPdf implementation, I think it would be beneficial to keep the same logic in both implementations. Moreover, keeping the old behavior we reduce the possibility of regression, in case we miss something or PdfBox unexpectedly changes the handling inside.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, I agree with you. It's safer to keep the old behaviour and reduce possibility of regressions.


/**
* Represents the PDF document loading setting.
* NOTE: The setting is applicable only when an instance of {@code eu.europa.esig.dss.model.FileDocument}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not true, the setting is applied also with other types of Document

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. It is indeed applicable in other cases too (mainly for PdfBox implementation),

// This condition uses a MappedByteBuffer to process the file in memory
return new PdfReader(filenameSource, passwordProtection);
case FILE:
return new PdfReader(new RandomAccessFileOrArray(filenameSource, false, true), passwordProtection);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the RandomAccessFileOrArray is declared as Closable, neither the openpdf is actually closing it.. instead shall we close it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I will check if closing it here won't brake the processing. Otherwise, I will do a similar logic with PdfBoxDocumentReader and RandomAccessRead handling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a look in the code of OpenPdf and it looks like the RandomAccessFileOrArray is actually being closed inside of PdfReader.close() method. See that RandomAccessFileOrArray is stored inside PRTokeniser object (tokens variable) https://github.com/LibrePDF/OpenPDF/blob/1.3.43/openpdf/src/main/java/com/lowagie/text/pdf/PdfReader.java#L284, which is closed in PdfReader.close() https://github.com/LibrePDF/OpenPDF/blob/1.3.43/openpdf/src/main/java/com/lowagie/text/pdf/PdfReader.java#L3408 and https://github.com/LibrePDF/OpenPDF/blob/1.3.43/openpdf/src/main/java/com/lowagie/text/pdf/PRTokeniser.java#L182.

So I think in order to not complexify the code, I will keep it as it is.

Moreover, I will go even further and will align the PdfBox implementation accordingly, as RandomAccessReadBuffer is actually being closed as well inside the PDDocument. See https://github.com/apache/pdfbox/blob/3.0.3/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDDocument.java#L1272.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants