Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-79 nio object parser #251

Merged
merged 59 commits into from
May 8, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
e71c161
GH-79 a sort of, kind of, working prototype
Mar 8, 2022
f363451
GH-79 a sort of, kind of, working a slightly better
Mar 10, 2022
cbfe61c
GH-79 small wins, still lots of work to do.
Jun 3, 2022
e663f91
Merge branch 'master' into GH-79.nio.object.parser
Jun 29, 2022
478f230
GH-79 fix unrelated annotation parsing error.
Nov 10, 2022
f9639a2
GH-79 fix parser offset issue.
Nov 10, 2022
37e83dd
GH-79 update logger
Nov 10, 2022
4120230
GH-79 fix decimal parsing
Nov 10, 2022
e019773
GH-79 some debug helpers
Nov 10, 2022
c7d0a1d
GH-79 fix encryption reference passthrough when parsing arrays
Nov 11, 2022
300513b
GH-79 fix nasty CrossReferenceTable error where subgroups were not pa…
Nov 12, 2022
4a8aeaa
Merge branch 'master' into GH-79.nio.object.parser
Nov 14, 2022
f239c64
GH-79 more encryption reference issues.
Nov 14, 2022
bf7002e
GH-79 fix DictionaryEntries type changes in the lexer.
Nov 17, 2022
0dd8f54
GH-79 touch up white space handling for streams
Nov 19, 2022
406fd56
GH-79 fix incorrect starting object number when parsing xref table
Nov 19, 2022
9a365c5
GH-79 little code clean up.
Nov 23, 2022
efe4d77
GH-79 address parser issue where know types were not returned as obje…
Dec 2, 2022
117a802
Merge branch 'master' into GH-79.nio.object.parser
Jan 13, 2023
7c4de19
Merge branch 'master' into GH-79.nio.object.parser
Jan 13, 2023
cc9e916
GH-79 fix issue trying to load a font data stream as a font object.
Jan 17, 2023
f6cdbb3
GH-79 fix issue where zeroed stream wasn't taking into account the en…
Jan 17, 2023
85312d0
GH-79 make sure we don't clear a valid font bbox.
Jan 17, 2023
814e55c
GH-79 further refinement of the approach.
Jan 22, 2023
e3dcf68
GH-79 cleaned up stream and table cross-reference common code as well…
Jan 23, 2023
70f8c93
GH-79 further clean up and brought back incremental updates for table…
Jan 24, 2023
d2bcdc7
GH-79 attempted cleanup/removal of throwable from code base.
Jan 25, 2023
a2b39a5
GH-80 fix some missing 2 byte hex values from not being rendering cor…
Jan 25, 2023
3c23ad7
GH-79 remove redundant new BufferedOutputStream wrapper.
Jan 25, 2023
34c8282
GH-79 clean up extra new line creation.
Jan 25, 2023
fad1274
GH-79 todo cleanup
Jan 25, 2023
fb53c96
GH-79 refactor a recursive call that was getting away and not really …
Jan 27, 2023
e42ff2b
GH-79 standardize buffer sizes for and minimize parser creation
Jan 29, 2023
8257457
GH-79 fix missing generation when adding a used entry.
Jan 30, 2023
46ed19e
GH-79 fix layer visibility state issue by pulling from catalog when p…
Feb 1, 2023
d572719
GH-79 more exception handling and logging touchups.
Feb 2, 2023
9829529
GH-79 more logging touch ups.
Feb 3, 2023
6b64e7d
GH-79 revert PdfSecurityException change
Feb 3, 2023
4abab2d
GH-79 touch up digital signatures to use the new nio file access.
Feb 8, 2023
00bdb64
GH-79 fix nasty offset issue when reindexing and looking for xref pos…
Feb 14, 2023
1a283d3
GH-79 move annotation to soft references so we less likely to lose re…
Feb 15, 2023
a61d70c
Merge branch 'master' into GH-79.nio.object.parser
Feb 15, 2023
233a2e2
GH-79 revert back to 4 threads.
Feb 16, 2023
a643c74
GH-79 attempt to fix attachments, not there yet
Feb 16, 2023
2ff1c0b
GH-79 migrated to slice over buffer get operations.
Feb 17, 2023
d317f31
GH-79 reworked Parser.getPObject to handle attachments that can span …
Feb 18, 2023
6f6c0da
GH-79 touch up artifacts on icon.
Feb 18, 2023
8850669
GH-79 touch up artifacts on icon.
Feb 18, 2023
81aced7
GH-79 fix issue when loading a catalogue followed by a regular docume…
Feb 18, 2023
e136ee9
GH-79 code clean up
Feb 22, 2023
727afdf
GH-79 code clean up
Feb 22, 2023
7178314
GH-79 more code clean up
Feb 23, 2023
1952e59
GH-79 code clean up
Feb 24, 2023
6186daa
GH-79 code clean up
Feb 24, 2023
10f8261
GH-79 code clean up
Feb 24, 2023
e19723a
GH-79 code clean up
Feb 27, 2023
09fe6a2
GH-79 misc. log combing fixes.
Mar 9, 2023
43625d8
GH-79 address concurrent access/modification during page init when a …
Mar 9, 2023
6450ca1
Merge branch 'master' into GH-79.nio.object.parser
Mar 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
GH-79 migrated to slice over buffer get operations.
  • Loading branch information
Patrick Corless committed Feb 17, 2023
commit 2ff1c0b3906f2688aebec3fc09c59a6f5a327760
Original file line number Diff line number Diff line change
Expand Up @@ -161,10 +161,7 @@ public ByteBuffer getDecodedStreamByteBuffer() {

public ByteBuffer getDecodedStreamByteBuffer(int presiz) {
byte[] decodeBytes = getDecodedStreamBytes(presiz);
ByteBuffer decodedByteBuffer = ByteBuffer.allocateDirect(decodeBytes.length);
decodedByteBuffer.put(decodeBytes);
decodedByteBuffer.position(0);
return decodedByteBuffer;
return ByteBuffer.wrap(decodeBytes);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,10 @@ public class Header {
private double version;

public ByteBuffer parseHeader(ByteBuffer byteBuffer) {
ByteBuffer headerBuffer = ByteBuffer.allocateDirect(Math.min(byteBuffer.limit(), 8 * 1024));
while (headerBuffer.hasRemaining()) {
headerBuffer.put(byteBuffer.get());
}
headerBuffer.flip();
byteBuffer.limit(Math.min(byteBuffer.limit(), 8 * 1024));
ByteBuffer headerBuffer = byteBuffer.slice();
byteBuffer.limit(byteBuffer.capacity());

int matchPosition = 0;
int matchLength = PDF_VERSION_MARKER.length - 1;
while (headerBuffer.hasRemaining()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,11 @@ public void parseXrefOffset(ByteBuffer byteBuffer) throws CrossReferenceStateExc

private void parseXrefOffset(ByteBuffer byteBuffer, int bufferSize) throws CrossReferenceStateException {
// find xref offset.
ByteBuffer footerBuffer = ByteBuffer.allocateDirect(bufferSize);
byteBuffer.position(byteBuffer.limit() - footerBuffer.limit());
while (footerBuffer.hasRemaining()) {
footerBuffer.put(byteBuffer.get());
}
footerBuffer.flip();
byteBuffer.position(byteBuffer.limit() - bufferSize);
byteBuffer.limit(byteBuffer.capacity());
ByteBuffer footerBuffer = byteBuffer.slice();
byteBuffer.limit(byteBuffer.capacity());

// find end of file marker and startxref so we can parse the xref offset.
int offsetEnd = ByteBufferUtil.findReverseString(footerBuffer, footerBuffer.limit(), PDF_EOF_MARKER);
if (offsetEnd == footerBuffer.limit()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@
*/
public class ByteBufferUtil {

public static ByteBuffer copyObjectStreamSlice(ByteBuffer objectByteBuffer, int objectOffsetStart, int objectOffsetEnd) {
public static ByteBuffer sliceObjectStream(ByteBuffer objectByteBuffer, int objectOffsetStart, int objectOffsetEnd) {
int streamLength = objectOffsetEnd - objectOffsetStart;
int oldLimit = objectByteBuffer.limit();
ByteBuffer streamByteBuffer = ByteBuffer.allocateDirect(streamLength);
int boundLimit = Math.min(objectOffsetStart + streamLength, objectByteBuffer.capacity());

objectByteBuffer.position(objectOffsetStart);
objectByteBuffer.limit(objectOffsetStart + streamLength);
streamByteBuffer.put(objectByteBuffer);
objectByteBuffer.limit(boundLimit);
ByteBuffer streamByteBuffer = objectByteBuffer.slice();
objectByteBuffer.limit(oldLimit);
streamByteBuffer.flip();
return streamByteBuffer;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public Parser(Library library) {
public PObject getPObject(ByteBuffer byteBuffer, int objectOffsetStart)
throws IOException, ObjectStateException {

int objectOffsetEnd = 0;
int objectOffsetEnd;
int streamOffsetStart;
int streamOffsetEnd;
ByteBuffer streamByteBuffer;
Expand All @@ -61,9 +61,11 @@ public PObject getPObject(ByteBuffer byteBuffer, int objectOffsetStart)
objectOffsetEnd = byteBuffer.position() - XREF_MARKER.length;
}
// scan looking for the stream object end
// copy the bytes to a new buffer so we can work on the bytes without thread position issues.
streamByteBuffer = ByteBufferUtil.copyObjectStreamSlice(byteBuffer, objectOffsetStart, objectOffsetEnd);
// copy the bytes to a new buffer, so we can work on the bytes without thread position issues.
streamByteBuffer = ByteBufferUtil.sliceObjectStream(byteBuffer, objectOffsetStart, objectOffsetEnd);
}
// todo mark the stream position, no need for a buffer yet, already have one

// grab the pieces of the object
Lexer lexer = new Lexer(library);
lexer.setByteBuffer(streamByteBuffer);
Expand Down Expand Up @@ -99,9 +101,17 @@ public PObject getPObject(ByteBuffer byteBuffer, int objectOffsetStart)
Object streamOrEndObj = lexer.nextToken();
if (streamOrEndObj instanceof Integer && ((Integer) streamOrEndObj) == OperandNames.OP_stream) {
lexer.skipWhiteSpace();

// create a new buffer to encapsulate the stream data using the length

// look for garbage as before

// check if we have an end_stream marker as expected, correct offset if needed.

// stream offset
streamOffsetStart = streamByteBuffer.position();
int streamLength = library.getInt((DictionaryEntries) objectData, Dictionary.LENGTH_KEY);

// doublc check a streamLength = zero, some encoders are lazy and there is actually data.
if (streamLength == 0 && (streamByteBuffer.limit() - END_STREAM_MARKER.length) - streamOffsetStart > 0) {
streamLength = (streamByteBuffer.limit() - END_STREAM_MARKER.length) - streamOffsetStart;
Expand All @@ -126,9 +136,7 @@ public PObject getPObject(ByteBuffer byteBuffer, int objectOffsetStart)
// trim the buffer to the stream start end.
streamByteBuffer.position(streamOffsetStart);
streamByteBuffer.limit(streamOffsetEnd);
streamByteBuffer = streamByteBuffer.compact();
streamByteBuffer.position(0);
streamByteBuffer.limit(streamOffsetEnd - streamOffsetStart);
streamByteBuffer = streamByteBuffer.slice();
} else {
streamByteBuffer = null;
}
Expand Down Expand Up @@ -157,17 +165,16 @@ public CrossReference getCrossReference(ByteBuffer byteBuffer, int starXref)
// sometimes the offset is off just by a few bytes
byteBuffer.position(starXref - 10);
xrefPositionStart = byteBuffer.position();

// make sure we have a xref declaration
int bytesLeft = byteBuffer.limit() - byteBuffer.position();
lookAheadBuffer = ByteBuffer.allocateDirect(Math.min(bytesLeft, 48));
while (lookAheadBuffer.hasRemaining()) {
lookAheadBuffer.put(byteBuffer.get());
}
}
lookAheadBuffer.flip();
boolean foundXrefMarker = ByteBufferUtil.findString(lookAheadBuffer, XREF_MARKER);
Lexer objectLexer = new Lexer(library);
synchronized (library.getMappedFileByteBufferLock()) {
int bytesLeft = Math.min(byteBuffer.limit() - byteBuffer.position(), 48);
byteBuffer.limit(byteBuffer.position() + bytesLeft);
lookAheadBuffer = byteBuffer.slice();
byteBuffer.limit(byteBuffer.capacity());

boolean foundXrefMarker = ByteBufferUtil.findString(lookAheadBuffer, XREF_MARKER);
Lexer objectLexer = new Lexer(library);

// see if we found xref marking and thus an < 1.5 formatted xref table.
if (foundXrefMarker) {
// update the xref position as we will have removed any white space.
Expand Down Expand Up @@ -197,7 +204,7 @@ private CrossReference parseCrossReferenceTable(DictionaryEntries dictionaryEntr
// mark the xref start, so it can be used to write future /prev entries.
int xrefStartPos = start;
// allocate to a new buffer as the data is well-defined.
ByteBuffer xrefTableBuffer = ByteBufferUtil.copyObjectStreamSlice(byteBuffer, start, end);
ByteBuffer xrefTableBuffer = ByteBufferUtil.sliceObjectStream(byteBuffer, start, end);
CrossReferenceTable crossReferenceTable = new CrossReferenceTable(library, dictionaryEntries, xrefStartPos);
objectLexer.setByteBuffer(xrefTableBuffer);
// parse the sub groupings
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2903,7 +2903,7 @@ public void commonNewDocumentHandling(String fileDescription) {
// values are SinglePage, OnceColumn, TwoColumnLeft, TwoColumnRight,
// TwoPageLeft, TwoPageRight.
Object tmp = catalog.getObject(Catalog.PAGELAYOUT_KEY);
if (tmp != null && tmp instanceof Name) {
if (tmp instanceof Name) {
String pageLayout = ((Name) tmp).getName();
int viewType = DocumentViewControllerImpl.ONE_PAGE_VIEW;
if (pageLayout.equalsIgnoreCase("OneColumn")) {
Expand Down