Skip to content

Commit 238264f

Browse files
committed
Initial commit - version 0.1.0-beta.1
1 parent c852e49 commit 238264f

File tree

10 files changed

+11006
-1
lines changed

10 files changed

+11006
-1
lines changed

.gitignore

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# See http://help.github.com/ignore-files/ for more about ignoring files.
2+
3+
# compiled output
4+
compiled
5+
dist
6+
7+
# dependencies
8+
node_modules
9+
10+
# IDEs and editors
11+
.idea
12+
.project
13+
.classpath
14+
.c9/
15+
*.launch
16+
.settings/
17+
*.sublime-workspace
18+
.vscode/*
19+
!.vscode/settings.json
20+
!.vscode/tasks.json
21+
!.vscode/launch.json
22+
!.vscode/extensions.json
23+
24+
# misc
25+
.sass-cache
26+
connect.lock
27+
coverage
28+
libpeerconnection.log
29+
npm-debug.log
30+
testem.log
31+
typings
32+
.firebaserc
33+
firebase.json
34+
firebase-debug.log
35+
36+
# e2e
37+
e2e/*.js
38+
e2e/*.map
39+
40+
# system files
41+
.DS_Store
42+
Desktop.ini
43+
Thumbs.db

.npmignore

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Development
2+
compiled
3+
node_modules
4+
*.log
5+
6+
# Config
7+
.angular-cli.json
8+
.editorconfig
9+
.firebaserc
10+
.gitignore
11+
.npmignore
12+
firebase.json
13+
tslint.json
14+
typedoc.json
15+
16+
# IDE
17+
.idea
18+
.project
19+
.settings
20+
.idea/*
21+
*.iml
22+
*.swp
23+
24+
# misc
25+
npm-debug.log
26+
27+
# System Files
28+
.DS_Store
29+
Desktop.ini
30+
Thumbs.db

README.md

Lines changed: 198 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,198 @@
1-
# pdfAssembler
1+
# PDF Assembler
2+
3+
[![npm version](https://img.shields.io/npm/v/pdfassembler.svg?style=plastic)](https://www.npmjs.com/package/pdfassembler) [![npm downloads](https://img.shields.io/npm/dm/pdfassembler.svg?style=plastic)](https://www.npmjs.com/package/pdfassembler) [![GitHub MIT License](https://img.shields.io/github/license/dschnelldavis/pdfassembler.svg?style=social)](https://github.com/dschnelldavis/pdfassembler)
4+
[![Dependencies](https://david-dm.org/dschnelldavis/pdfassembler.svg)](https://david-dm.org/dschnelldavis/pdfassembler) [![devDependencies](https://david-dm.org/dschnelldavis/pdfassembler/dev-status.svg)](https://david-dm.org/dschnelldavis/pdfassembler?type=dev)
5+
6+
The missing piece to edit PDF files directly in the browser.
7+
8+
PDF Assembler Disassembles PDF files into editable JavaScript objects, then assembles them back into PDF files, ready to save, download, or open.
9+
10+
## Overview
11+
12+
Actually PDF Assembler itself only does one thing — it assembles PDF files (hence the name). However, it uses Mozilla's terrific [pdf.js](https://mozilla.github.io/pdf.js/) library to disassemble PDFs into JavaScript objects. Those objects can then be modified, after which PDF Assembler can re-assemble them back into PDFs, to display, save, or download.
13+
14+
### Scope and future development
15+
16+
PDF is a complex format (the [ISO standard describing it](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf) is 756 pages long). So PDF Assembler makes working with PDFs (somewhat) simpler by separating the physical structure of a PDF from its logical structure. In the future, PDF Assembler will likely offer better defaults for generating PDFs, such as cross-reference streams and compressing objects, as well as more options, such as linearizing or encrypting the output PDF. However, editing features—like adding or editing pages, or even centering or wrapping text—are outside the scope of this library.
17+
18+
### Alternatives
19+
20+
If you want a library to simplify creating PDFs, in a browser or on a server, you can use [jsPDF](https://github.com/MrRio/jsPDF) or [PDFKit](https://github.com/devongovett/pdfkit).
21+
22+
If you want to simplify editing existing PDFs on a server, you can use command line tools [QPDF](http://qpdf.sourceforge.net/) or [PDFTk](https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/), the Java tools [PDFBox](https://pdfbox.apache.org/) or [iText](https://github.com/ymasory/iText-4.2.0), or the Node module [Hummus](https://github.com/galkahana/HummusJS/wiki).
23+
24+
If you want to simplify editing existing PDFs in a browser, I haven't found that library yet. This library helps, but still requires a good understanding of how the logical structure of a PDF works.
25+
26+
If you want to learn more about logical structure of PDFs, I recommend O'Reilly's [PDF Explained](http://shop.oreilly.com/product/0636920021483.do). If you use this library, pdf.js and PDF Assembler will care of reading and writing the raw bytes of the PDF, so you can skip to Chapter 4, "Document Structure":
27+
28+
![x](https://www.safaribooksonline.com/library/view/pdf-explained/9781449321581/httpatomoreillycomsourceoreillyimages952073.png)
29+
Figure 4-1 shows the logical structure of a typical document. (PDF Explained, Chapter 4, page 39)
30+
31+
32+
## How it works - the PDF structure object
33+
34+
PDF Assembler accepts or creates a PDF structure object, which is a specially formatted JavaScript object that represents the logical structure of a PDF document as simply as possible, by mapping each type of PDF data to its closest JavaScript counterpart:
35+
36+
| PDF data type | JavaScript data type |
37+
|---------------|--------------------------------------|
38+
| dictionary | object |
39+
| array | array |
40+
| number | number |
41+
| name | string, starting with "/" |
42+
| string | string, surrounded with "()" or "<>" |
43+
| boolean | boolean |
44+
| null | null |
45+
46+
### "Hello world" example
47+
48+
Here's the structure object for a simple "Hello world" PDF:
49+
50+
```JavaScript
51+
const helloWorldPdf = {
52+
'/Root': {
53+
'/Type': '/Catalog',
54+
'/Pages': {
55+
'/Type': '/Pages',
56+
'/Count': 1,
57+
'/Kids': [ {
58+
'/Type': '/Page',
59+
'/MediaBox': [ 0, 0, 612, 792 ],
60+
'/Contents': [ {
61+
'stream': '1 0 0 1 72 708 cm BT /Helv 12 Tf (Hello world!) Tj ET'
62+
} ],
63+
'/Resources': {
64+
'/Font': {
65+
'/Helv': {
66+
'/Type': '/Font',
67+
'/BaseFont': '/Helvetica',
68+
'/Subtype': '/Type1'
69+
}
70+
}
71+
},
72+
} ],
73+
}
74+
}
75+
}
76+
```
77+
78+
In this object, the main document catalog dictionary is '/Root' (and if there were a document information dictionary, it would be '/Info', because '/Root' and '/Info' are the names used to refer to these objects in the PDF trailer dictionary).
79+
80+
There are a few small differences from a true PDF structure. For example, streams are _inside_ their dictionary objects in order to keep them together, even though in the final PDF they will be rendered immediately after their dictionaries instead.
81+
82+
Also, structure objects do not need to include stream '/Length' or page '/Parent' entries, because those entries will be automatically calculated and added when the PDF is assembled. (Adding them won't hurt anything, but there is no reason to, as they will just be overwritten.)
83+
84+
### Re-using shared dictionary items
85+
86+
If you want to use the same dictionary object in multiple places in a PDF, simply set the second location equal to the first, to create a reference from one part of the PDF structure object to another. (PDF Assembler will automatically recognize this, and sort out the details of creating an indirect object and adding PDF object references in the appropriate places.)
87+
88+
For example, here is how to add a second page to the above PDF, and re-use the resources from the first page:
89+
```javascript
90+
// add new page
91+
helloWorldPdf['/Root']['/Pages']['/Kids'].push({
92+
'/Type': '/Page',
93+
'/MediaBox': [ 0, 0, 612, 792 ],
94+
'/Contents': [ {
95+
'stream': '1 0 0 1 72 708 cm BT /Helv 12 Tf (This is page two!) Tj ET'
96+
} ]
97+
});
98+
99+
// assign page 2 (/Kids array item 1) to re-use
100+
// the resources from page 1 (/Kids array item 0)
101+
helloWorldPdf['/Root']['/Pages']['/Kids'][1]['/Resources'] =
102+
helloWorldPdf['/Root']['/Pages']['/Kids'][0]['/Resources'];
103+
```
104+
105+
### Grouping page trees
106+
107+
By default, PDF Assembler takes care of grouping pages for you. When you import a document, it will automatically flatten the page tree into one long array, and then re-group them when assembling the final PDF. Optionally, you can change the group size (the default is 16), or disable grouping. But in general, you can forget about grouping and just let PDF Assembler take care of it.
108+
109+
## Installing and using PDF Assembler
110+
111+
### Installing from NPM
112+
113+
So, if you're not scared off yet, and still want to use PDF Assembler in your project, it's pretty simple.
114+
115+
```shell
116+
npm install pdfassembler
117+
```
118+
119+
Next, import pdfassembler in your project, like so:
120+
121+
```javascript
122+
PDFAssembler = require('pdfassembler').PDFAssembler;
123+
```
124+
125+
or, in ES6:
126+
127+
```javascript
128+
include { PDFAssembler } from ('pdfassembler');
129+
```
130+
131+
### Loading a PDF
132+
133+
To us PDF Assembler, you must create a new PDFAssembler instance and initialize it, either with your own PDF structure object:
134+
```javascript
135+
// helloWorldPdf = the pdf object defined above
136+
const newPdf = new PDFAssembler(helloWorldPdf);
137+
```
138+
139+
Or, by importing a binary PDF file:
140+
```javascript
141+
// binaryPDF = a Blob, File, ArrayBuffer, or TypedArray containing a PDF file
142+
const newPdf = new PDFAssembler(binaryPDF);
143+
```
144+
145+
### Editing the PDF object
146+
147+
After you've created a new new PDFAssembler instance, you can request a promise with the PDF structure object, and then edit it.
148+
(Some of PDF Assembler's actions are asynchronous, so it's necessary to use a promise to make sure the PDF is fully loaded before you edit it.)
149+
150+
For example, here is how to edit a PDF to remove all but the first page:
151+
```javascript
152+
newPdf
153+
.pdfObject()
154+
.then(function(pdf) {
155+
pdf['/Root']['/Pages']['/Kids'] = pdf['/Root']['/Pages']['/Kids'].slice(0, 1);
156+
});
157+
```
158+
159+
### Problems with outlines and internal references
160+
161+
PDF Assembler does a good job managing page contents, and will automatically discard unused contents from deleted pages, while still retaining any contents used on other pages. However, if a PDF contains an outline or internal references that refer to a deleted page, those will cause errors in the assembled PDF file. (The PDF may still open and display, but the PDF reader will probably show an error message.) As a somewhat crude (and hopefully temporary) solution for this, PDF Assembler provides a function for removing all non-printable data from the root catalog, like so:
162+
163+
```javascript
164+
newPdf.removeRootEntries();
165+
```
166+
167+
The trade-off is that after running removeRootEntries(), your assembled PDF is less likely to have errors, and may also be smaller in size, but will also not have any outline or other non-printing information available in the original PDF.
168+
169+
### Assembling a new PDF file from the the PDF structure object
170+
171+
After editing, call assemblePdf() with a name for your new PDF, and PDF Assembler will assemble your PDF structure object and return a promise for a [File](https://developer.mozilla.org/en-US/docs/Web/API/File) containing your PDF, ready to download or save or whatever you want.
172+
173+
For example, here's how to assemble a PDF and use [file-saver](https://www.npmjs.com/package/file-saver) to save it:
174+
```javascript
175+
fileSaver = require('file-saver');
176+
// ...
177+
newPdf
178+
.assemblePdf('assembled-output-file.pdf')
179+
.then(function(pdfFile) {
180+
fileSaver.saveAs(pdfFile, 'assembled-output-file.pdf');
181+
});
182+
```
183+
184+
### PDF Assembler options
185+
186+
PDF Assembler has a few options that will change its behavior. All options can be set any time after you have created a new PDFAssembler instance and before you have assembled your final pdf, like so:
187+
188+
```javascript
189+
newPdf.compress = false;
190+
newPdf.indent = true;
191+
```
192+
193+
| option | default | description |
194+
|---------------|---------|---------------|
195+
| indent | false | Indents output to make it easier to read if you open the PDF in a text editor to look at the structure. Accepts a String or Number, similar to the space parameter in [JSON.stringify](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify).
196+
| compress | true | Compresses streams.
197+
| groupPages | true | Groups pages.
198+
| pageGroupSize | 16 | Size of largest page group.

0 commit comments

Comments
 (0)