This page is a summary of PDF tricks, either based on data encodings, JavaScript, or PDF structure, with hand-written proof-of-concepts.
<wiki:toc max_depth="3" />
DISCLAIMER The files are clean, and hand written by me, for better understanding and perfect clarity. However, because they use various tricks also present in malwares, they might be detected by your antivirus, or crash your usual program. They all work under Adobe Acrobat though, which is the point here.
In short: they're odd, but they work, and they're clean.
for many more JavaScript tricks, check the JavaScript Garden.
any data (or string) can be encoded in various ways.
A string can be stored as hex.
Example of Hello World!
string, with each character encoded by its value in hexadecimal (download):
48656c6c6f20576f726c6421
another classic one, each character is encoded in octal.
Example of Hello World!
string, with each characters encoded by its value in octal (download):
\110\145\154\154\157\40\127\157\162\154\144\41\
so far so good, nothing too weird. However, it wouldn't be Adobe if something wasn't unexpected.
While it's common to separate hex numbers with space (00 01 02...
), in a PDF, any kind of whitespace can be used (newline, tab, ...), and even between the 2 niblles of the same hex number.
Example of Hello World!
string, encoded as hexadecimal, with extra whitespace between the nibbles (download):
4
8
6
5
6 c 6c 6f 20
576f726c6421
ASCII can be stored as ascii (hopefully), but newlines can be inserted too...
Example of Hello World!
string, stored as ASCII, with extra new lines characters (download):
[...]
}\
\
\
\
\
\
\
H\
e\
l\
\
\
\
\
\
l\
o\
\
[...]
theoretically %PDF-1.[4-6]
, the signature can be truncated to ignore the last digit...
Example of truncated header signature (download):
%PDF-1.
...but it can be actually even shorter, provided you insert a null character.
Example of null-based header trick (download):
%PDF-\0
A PDF can contain a Names tree, in which each element is sequentially evaluated. This way, a script can be split in various parts, without any obvious trigger such as OpenAction.
Example of a JavaScript script, split into 2 parts in the Names tree (download):
[...]
/Names
[...]
/Names[
(1) <<
[...]
msg = "Hello";
[...]
(2) <<
[...]
msg = msg + " World!";
app.alert(msg);
[...]
on Acrobat X, a valid PDF doesn't need an object.
Example of a valid trailer-only, 36 bytes PDF, under Acrobat X (download):
%PDF-\0trailer<</Root<</Pages<<>>>>>>
this trick doesn't work on previous version of Acrobat. For them, an object - even empty, and with no index - has to be present.
Example of a valid, 48 bytes PDF, for Acrobat <=9 (download):
%PDF-\0obj<<>>trailer<</Root<</Pages<<>>>>>>
Unexpected tags are just ignored. this goes the same for tags with incorrect case.
Example of a trailer with incorrectly cased tags (download):
trailer
<<
/Root
<</tYpE/caTaLOG/Pages 1 0 R>>
>>
PDF is a format that supports incremental updates, so information can be written beyond the %EOF.
Example of a PDF where objects and trailer are beyond the EOF (download):
[...]
%%EOF
% dummy object to make this PDF valid
[...]
15 0 obj
[...]
% the trailer is not allowed to be before all other objects
trailer
<<
/Root<</Pages 1 0 R>>
>>
1 0 obj
<</Kids[<</Parent 1 0 R/Contents[2 0 R]>>]
[...]
2 0 obj
<<>>
stream
BT/default 20 Tf 1 0 0 1 1 715 Tm(this text and the PDF objects are stored beyond the %%EOF tag)Tj ET
[...]
PDF are parsed bottom-up by default, except if the first object (even a dummy non-referenced object) contains the /Linearized tag (with a dummy value)
Example of a PDF parsed either bottom-up or top-down (if linearized) (download):
[...]
% inserting/removing a dummy first object will change the parsing direction
12 0 obj <<<>>
[...]
31415 0 obj
<< /Linearized -42 >>
endobj
[...]
2 0 obj
<<>>
stream
BT/default 35 Tf 1 0 0 1 1 715 Tm(this PDF has been parsed top-down)Tj ET
endstream
endobj
[...]
20 0 obj
<<>>
stream
BT/default 35 Tf 1 0 0 1 1 715 Tm(this PDF has been parsed bottom-up)Tj ET
endstream
endobj
[...]
% if this trailer is taken in account, it will display 'Top-down'
trailer
<<
/Root
<<
/Pages 1 0 R
>>
>>
[...]
% if this trailer is taken in account, it will display 'Bottom-up'
trailer
<<
/Root
<<
/Pages 10 0 R
>>
>>
Here are a few other encoding tricks, that use javascript (not necessarily specific to Acrobat's)
the simplest tricks of all for strings in Javascript, split into parts, then concatenate back.
Example of a Hello World!
string concatenated before display (download):
[...]
B="Hell";
on="o ";
jour="World!";
app.alert(B + on + jour);
[...]
simple obfuscation, add some extra character/switch then replace/restore them before use.
Example of Hello World!
string, obfuscated with extra characters (download):
"zHzzezzlzlzozzzzz zWozrzldz!").replace(/z/g,"")
another standard javascript obfuscation, replace characters with escaped ones.
Example of Hello World! string, encoded with javascript escaping (download):
unescape("%48%65%6C%6C%6F%20%57%6F%72%6C%64%21")
like CAFEBABE can be read as a word or as a hex number, any uppercase word can be written as a base32 (32 or less, depending on the last (in the alphabet) used character) numbers.
Example of a HELLO WORLD!
string, with each word encoded as a number, in different bases (download):
(6873049).toString(25) + " " + (38842069).toString(33) + "!"
A classic: some code can be built in a string, then executed via evaluation.
Example of a string of code being evaluated (download):
[...]
eval('app.alert("Hello World!");');
[...]
A function such as alert can be called, not only directly, but also as a substring of its parent object, app in our example.
Example of a JavaScript function called by a string reference from its parent (download):
[...]
app["alert"]("Hello World!");
[...]
the previous example can be extended with a fake array with fake entries, which makes the actual executed function harder to spot.
Example of a code string evaluated via a fake array reference (download):
[...]
e = ("fake1")[("fake2", "eval")]
e('app.alert("Hello World!");');
[...]
A JavaScript function can access its own code, and use it for anything. Thus, any modification might prevent the function to work correctly: typically, such functions use their code as a key for some decryption.
Example of a decryption function, using its own code as decryption key (download):
[...]
function decrypt(cipher)
[...]
key = arguments.callee.toString();
[...]
return plaintext;
}
[...]
app.alert(decrypt(unescape(".%10%02%0F%1BI8%01R%08%01B")));
[...]
these tricks are hiding some information in one of the various component of the PDF, and retrieving that information via JavaScript.
a PDF's trailer can contain an Info dictionary, whose elements' contents can be retrieved via JavaScript via the info object's properties.
Example of a string being stored as element of a trailer's Info dictionary (download):
[...]
/Info <</Author(Hello) /Title( World) /Producer( !)>>
[...]
app.alert(info.author + info.title + info.producer);
[...]
Similarly, a PDF page can contain an annotation, which content can be retrieved by getAnnots on the specific page.
Example of a string being stored as a page's annotation's subject (download):
[...]
/Annots
[...]
/Subj (Hello World!)
[...]
d = app.doc;
d.syncAnnotScan();
a = d.getAnnots({ nPage: 0 });
app.alert(a[0].subject);
[...]
An !AcroForm Widget can contain some data and value, which can be retrieved via the getField function.
Example of a string stored in an AcroForm Widget (download):
[...]
/AcroForm
[...]
/Subtype/Widget
/T(mydata) % this is the name
/V(Hello World!) % this is the value
[...]
app.alert(this.getField('mydata').value);
[...]
A form can contain a template, with an event on its field, that contains a script.
Example of a JavaScript stored in the field event of a form (download):
[...]
1 0 obj <<>>
[...]
<xdp:xdp xmlns:xdp="http://ns.adobe.com/xdp/">
[...]
<template>
<subform name="_">
<pageSet/>
<field id="Hello World!">
<event activity="initialize">
<script contentType='application/x-javascript'>
app.alert(this.id);
[...]
trailer <<
/Root <<
/AcroForm <<
[...]
/XFA 1 0 R
[...]
On top of standard JavaScript, it's possible, via Acrobat specificities, to determine if we're in the real program or in an emulator.
some Adobe specific global variables are initialized with default values, so checking them might stop an emulator.
Example of Acrobat JavaScript code that checks initial values as an anti-emulator (download):
[...]
if ((app)
&& (event.target.zoomType.toString() == 'FitPage'))
[...]
global variables show a different behavior as opposed to normal ones: their type is no updated, even if you explicitely assign them new values.
Example of Acrobat JavaScript code that sets global variables, and checks the result (download):
[...]
hidden=0; // global
hidden_=0; // not global
if ((hidden_ === 0) && (hidden !== 0))
app.alert("nothing unusual detected.");
[...]
see earlier in the page for PDFs as small as possible, but containing nothing.
to actually contain a page, an object, defining its kid as its parent, should be referenced in trailer/Root/Pages.
Example of a minimalist PDF with a page (download):
%PDF-\01 0 obj<</Kids[<</Parent 1 0 R>>]>>trailer<</Root<</Pages 1 0 R>>>>
To define a text, the page should have empty resources, and a content, made of the text itself.
Example of a minimalist PDF with page + text, under Acrobat <=9 (download):
%PDF-\01 0 obj<</Kids[<</Parent 1 0 R/Contents[2 0 R]>>]/Resources<<>>>>2 0 obj<<>>stream
BT/default 99 Tf 1 0 0 1 1 715 Tm(Hello World!)Tj ET
endstream
trailer<</Root<</Pages 1 0 R>>>>
Note this object doesn't have to have an endobj tag. On the contrary, it's required under Acrobat X.
Example of a minimalist PDF with page + text, under Acrobat X (download):
%PDF-\01 0 obj<</Kids[<</Parent 1 0 R/Contents[2 0 R]>>]/Resources<<>>>>2 0 obj<<>>stream
BT/default 99 Tf 1 0 0 1 1 715 Tm(Hello World!)Tj ET
endstream
endobj
trailer<</Root<</Pages 1 0 R>>>>
to be able to use JavaScript, a PDF just needs to be empty (see above), with an OpenAction in the trailer/root.
Example of a minimalist PDF using JavaScript, under Acrobat <=9 (download):
%PDF-\0obj<<>>trailer<</Root<</Pages<<>>/OpenAction<</S/JavaScript/JS(app.alert('Hello World!');)>>>>>>
Example of a minimalist PDF using JavaScript, under Acrobat X (download):
%PDF-\0trailer<</Root<</Pages<<>>/OpenAction<</S/JavaScript/JS(app.alert('Hello World!');)>>>>>>
if a FlatEncoded stream is truncated or corrupted, it will be taken as is until the error - and still interpreted (typically, such a stream would be totally discarded).
An easy way to get the original corrupted stream to decompress like Reader would, is to actually get Reader to decompress it - without executing it - by pruning it into a template PDF with attachment. This way, Adobe Reader will actually be the one decompressing the stream, but not execute it.
WARNING: it will not work if the PDF is encrypted - it should be decrypted first.
using several degrees of PDF malformations on one-page 'Hello world' documents, you can test your PDF tools 'robustness':
- naive is a naive PDF, extremely standard.
- generic doesn't contain stream lengths, xref, and EOF, which are the most standard malformations.
- slightly linux specific
- MuPDF contains NO PDF signature, incomplete Tf parameters, and fewer elements, yet is fully supported by many readers (Evince, ubuntu preview)
- sumatra is similar to MuPDF, except it contains no endobj, and fewer objects.
- adobe reader specific
- reader-X contains a truncated signature, specific to Adobe. Also, no font definition is required.
- reader is the same, but it also miss an endobj on the stream object.
- chrome specific
- truncated
%PDF
signature - (disabled) recursive object
- bogus trailer
- trailer in a bogus object, in a comment
| software | naive | generic | mupdf | sumatra | reader | readerX | |-| | | | | | | | PDF.JS | Ok | fail | fail | fail | fail | fail | | | | | | | | | | Chrome | Ok | Ok | fail | fail | fail | fail | | Mountain Lion | Ok | Ok | fail | fail | fail | fail | | Safari | Ok | Ok | fail | fail | fail | fail | | Goodreader | warning then Ok | warning then Ok | fail | fail | fail | fail | | | | | | | | | | Adobe Reader 7 | Ok | Warning then Ok | fail | fail | Ok | Warning then Ok | | Adobe Reader 8 | Ok | Ok | fail | fail | Ok | Ok | | Adobe Reader 9 | Ok | Ok | fail | fail | fail | Ok | | Adobe Reader X | Ok | Ok | fail | fail | fail | Ok | | | | | | | | | | Evince | Ok | Ok | Ok | fail | fail | fail | | muPDF | Ok | Ok | no text | Ok | fail | fail | | Sumatra | Ok | Ok | Ok | Ok | fail | fail | | Ubuntu preview | Ok | Ok | Ok | fail | fail | fail | | | | | | | | |
a working PDF, giving different results in Sumatra, Chrome reader and Adobe Reader, with oddities specific to each viewer.
A study on how to reveal or hide secrets in PDF documents.
Presented at RaumZeitLabor on the 2014/05/17 (PoCs)
<wiki:gadget url="https://corkami.googlecode.com/svn/wiki/gadgets/pdfsecrets_slideshare.xml" width=595 height=497 border=0/>
<wiki:video url="http://www.youtube.com/watch?v=JQrBgVRgqtc"/>