Skip to content

Commit eec0695

Browse files
committed
Replace Cairo PDF device with native PDF device (v0.9.3)
The native PDF device becomes the default -d pdf output, generating PDF content streams directly from the display list instead of rendering through Cairo. This preserves original color spaces (CMYK, Gray, RGB) rather than forcing RGB conversion. Moved all native_pdf modules into devices/pdf/, removed the Cairo-based pdf.py and pdf_injector.py, updated imports, control.py finalization, device config (600 DPI, no ColorModel), and visual_test.py. Updated all documentation to reflect the new architecture.
1 parent 599ac5a commit eec0695

25 files changed

+355
-2492
lines changed

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<p align="center"><strong>A modern, open-source PostScript interpreter written in Python.</strong></p>
44

55
<p align="center">
6-
<a href="https://github.com/AndyCappDev/postforge/releases"><img src="https://img.shields.io/badge/Version-0.9.2-green.svg" alt="Version 0.9.0"></a>
6+
<a href="https://github.com/AndyCappDev/postforge/releases"><img src="https://img.shields.io/badge/Version-0.9.3-green.svg" alt="Version 0.9.0"></a>
77
<a href="LICENSE.txt"><img src="https://img.shields.io/badge/License-AGPL--3.0-blue.svg" alt="License: AGPL-3.0"></a>
88
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/Python-3.13%2B-blue.svg" alt="Python 3.13+"></a>
99
</p>
@@ -52,10 +52,11 @@ See the [sample gallery](docs/samples.md) for larger rendered examples.
5252
exploration, debugging, and experimentation
5353
- **Cython-Accelerated Execution** — Optional Cython-compiled execution loop
5454
providing 15–40% speedup depending on workload
55-
- **Multiple Output Formats** — PNG, PDF, SVG, and TIFF output via Cairo
56-
graphics backend, plus an interactive Qt display window with a PostScript
57-
command prompt; TIFF supports multi-page and CMYK output for prepress
58-
workflows; extensible architecture makes it straightforward to add new devices
55+
- **Multiple Output Formats** — PNG, PDF, SVG, and TIFF output, plus an
56+
interactive Qt display window with a PostScript command prompt; the PDF
57+
device generates content streams directly, preserving CMYK/Gray/RGB color
58+
spaces; TIFF supports multi-page and CMYK output for prepress workflows;
59+
extensible architecture makes it straightforward to add new devices
5960
- **PDF Font Embedding** — Type 1 font reconstruction and subsetting,
6061
TrueType/CID font extraction with CIDToGIDMap and ToUnicode support
6162
- **EPS Support** — Automatic page cropping to EPS content dimensions with

docs/design/detailed-gap-analysis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545
| Types 2-5 | **Accepted, not processed** | Dictionary validation passes but no halftone-specific rendering behavior. Falls through to device defaults. |
4646
| Types 6, 10, 16 | **Accepted, not processed** | Same as above |
4747

48-
**Practical impact:** Low. PostForge outputs to Cairo-backed devices (PNG, PDF, SVG) which handle their own halftoning.
48+
**Practical impact:** Low. PostForge's raster devices (PNG, SVG, TIFF) use Cairo which handles its own halftoning, and the PDF device generates vector output directly.
4949

5050
### Transfer Functions
5151

docs/developer/adding-output-devices.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -311,8 +311,8 @@ def render_display_list(
311311

312312
- `page_height` is needed for the PostScript-to-Cairo coordinate system flip
313313
(PostScript origin is bottom-left, Cairo is top-left)
314-
- `deferred_text_objs` is used by the PDF device to collect text objects for
315-
later font embedding. Pass `None` for bitmap devices.
314+
- `deferred_text_objs` can be used to collect text objects for later
315+
processing. Pass `None` for bitmap devices.
316316

317317
### Option B: Process the Display List Directly
318318

@@ -433,24 +433,23 @@ you can store arbitrary Python objects in it (class instances, lists,
433433
open file handles, etc.) using a bytes key, and retrieve them on subsequent
434434
`showpage` calls.
435435

436-
The PDF device uses this to maintain a `PDFDocumentState` that holds the Cairo
437-
PDF surface, font tracker, deferred text objects, and page counter across all
438-
pages:
436+
The PDF device uses this to maintain a `PDFDocumentState` that holds the
437+
font tracker, accumulated page data, Type 3 font state, and page counter
438+
across all pages:
439439

440440
```python
441441
PDF_STATE_KEY = b'_PDFDocumentState'
442442

443443
def showpage(ctxt, pd):
444-
pdf_state = pd.get(PDF_STATE_KEY)
445-
if pdf_state is None:
444+
state = pd.get(PDF_STATE_KEY)
445+
if state is None:
446446
# First page — initialize and store in page device dict
447-
pdf_state = PDFDocumentState(file_path, width, height)
448-
pd[PDF_STATE_KEY] = pdf_state
447+
state = PDFDocumentState(file_path)
448+
pd[PDF_STATE_KEY] = state
449449

450-
# Render current page using persistent state
451-
pdf_state.start_new_page(...)
452-
render_display_list(ctxt, pdf_state.context, ...)
453-
pdf_state.finish_page()
450+
# Generate content stream from display list and store page data
451+
content_stream, ... = generate_content_stream(ctxt.display_list, ...)
452+
state.pages.append(PageData(content_stream, width_pts, height_pts))
454453
```
455454

456455
This works because the page device dictionary is just a Python `dict` — you can
@@ -471,15 +470,15 @@ dict entirely are `setpagedevice` (which rebuilds it from scratch) and
471470
### Job Finalization (PDF)
472471

473472
Multi-page devices may need a finalization step after the last page. The PDF
474-
device uses a `finalize_document` function that closes the Cairo surface and
475-
injects embedded fonts. This is called from the job control code in
473+
device uses a `finalize` function that assembles all accumulated pages into
474+
the final PDF with embedded fonts. This is called from the job control code in
476475
`postforge/operators/control.py`:
477476

478477
```python
479478
# In control.py job cleanup:
480-
from ..devices.pdf.pdf import PDF_STATE_KEY, finalize_document
479+
from ..devices.pdf.pdf import PDF_STATE_KEY, finalize
481480
if PDF_STATE_KEY in pd:
482-
finalize_document(pd)
481+
finalize(pd)
483482
```
484483

485484
If your device needs finalization, follow the same pattern: export a
@@ -493,8 +492,8 @@ The PDF device uses `/TextRenderingMode /TextObjs` to receive structured text
493492
data instead of rendered glyph paths. It then:
494493

495494
1. Tracks font usage across all pages via `FontTracker`
496-
2. Collects deferred `TextObj` elements that need font embedding
497-
3. At finalization, reconstructs Type 1 fonts and injects them into the PDF
495+
2. Generates PDF text operators (BT/ET blocks, TJ arrays with kern values)
496+
3. At finalization, embeds fonts (Type 1, CID, CFF, Type 42, Type 3) into the PDF
498497

499498
This pattern is only needed for devices that require structured text
500499
information (e.g., for searchability or font embedding).
@@ -526,13 +525,14 @@ Key features: anti-alias mode support, configurable output path.
526525
### PDF (`postforge/devices/pdf/`)
527526

528527
Complex multi-page device. Maintains a `PDFDocumentState` across pages.
529-
Uses Cairo `PDFSurface` for graphics, then post-processes with pypdf for
530-
font embedding. Applies a scaling transform to convert from device coordinates
531-
(at `HWResolution`) to PDF points (72 DPI).
532-
533-
Key features: persistent state, font tracking and embedding (Type 1 and
534-
CID/TrueType), deferred text rendering, document finalization, stream
535-
compression.
528+
Generates PDF content streams directly from the display list (does not use
529+
Cairo), preserving original color spaces (CMYK, Gray, RGB). Content stream
530+
generation is split into focused submodules (stroke_ops, text_ops, type3_ops,
531+
image_ops, shading_ops). The final PDF is assembled at document end via pypdf.
532+
533+
Key features: persistent state, color space preservation, font tracking and
534+
embedding (Type 1, CID/TrueType, CFF, Type 42, Type 3), text batching with
535+
TJ arrays, document finalization.
536536

537537
### SVG (`postforge/devices/svg/`)
538538

docs/developer/architecture-overview.md

Lines changed: 62 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,14 @@ PostScript Source ──► Tokenizer ──► Execution ──► │ Display
3939
like `fill`, `stroke`, and `show` append elements here.
4040

4141
5. **Output Device** — When `showpage` fires, the accumulated display list is
42-
handed to a device for rendering. Devices live in `postforge/devices/` and
43-
the current implementation typically delegates to a shared Cairo rendering
44-
backend, but this is not a requirement — an output device can use whatever
45-
rendering method it wants to process the display list. After rendering,
46-
`showpage` erases the display list and reinitializes the graphics state for
47-
the next page. `copypage` follows the same rendering path but preserves
48-
both the display list and graphics state, allowing further drawing on top
49-
of the existing page contents.
42+
handed to a device for rendering. Devices live in `postforge/devices/`
43+
raster devices (PNG, TIFF, Qt) and SVG use a shared Cairo rendering
44+
backend, while the PDF device generates content streams directly from the
45+
display list. An output device can use whatever rendering method it wants.
46+
After rendering, `showpage` erases the display list and reinitializes the
47+
graphics state for the next page. `copypage` follows the same rendering
48+
path but preserves both the display list and graphics state, allowing
49+
further drawing on top of the existing page contents.
5050

5151

5252
## The Execution Engine
@@ -357,7 +357,7 @@ The display list is a flat Python list containing instances of these classes
357357
| `Stroke` | `stroke` | Stroked path with line properties and CTM |
358358
| `PatternFill` | `fill` with pattern color space | Pattern-tiled fill |
359359
| `ImageElement` | `image`, `imagemask`, `colorimage` | Raster image data |
360-
| `TextObj` | `show` (in TextObjs mode) | Text for native PDF output |
360+
| `TextObj` | `show` (in TextObjs mode) | Structured text for PDF output |
361361
| `ClipElement` | `clip`, `eoclip`, `initclip` | Clipping path update |
362362
| `GlyphRef` | show (cache hit) | Reference to cached glyph bitmap |
363363
| `GlyphStart`/`GlyphEnd` | show (cache miss) | Glyph bitmap capture markers |
@@ -421,19 +421,23 @@ conversion formulas (NTSC weighting for gray, etc.).
421421
### Color Conversion at Rendering Time
422422

423423
Color conversion is *lazy*`setcolor` stores the color in the graphics
424-
state, but the conversion to device color (RGB for the Cairo renderer) happens
425-
only when a painting operator builds a display list element:
424+
state, but the conversion to device color happens only when a painting operator
425+
builds a display list element:
426426

427427
1. A painting operator (`fill`, `stroke`, etc.) calls
428428
`ColorSpaceEngine.convert_to_device_color()`.
429429
2. The engine dispatches based on color space family — device spaces pass
430430
through (with cross-conversion if needed), CIE-based spaces run through
431431
their decode/matrix/XYZ pipeline, and ICCBased spaces apply an lcms2
432432
transform.
433-
3. The resulting RGB values are stored in the display list element (`Fill`,
434-
`Stroke`, etc.).
435-
4. The rendering device receives pre-converted RGB and passes it straight
436-
to Cairo.
433+
3. The resulting color values are stored in the display list element (`Fill`,
434+
`Stroke`, etc.). When a `/ColorModel` is set in the page device (e.g.,
435+
`/DeviceRGB` for Cairo-based raster devices), colors are converted to that
436+
model. When no `/ColorModel` is set (the PDF device), original device color
437+
spaces (CMYK, Gray, RGB) are preserved.
438+
4. The rendering device consumes these colors — Cairo-based devices receive
439+
RGB, while the PDF device emits the appropriate PDF color operators for
440+
whatever color space was used.
437441

438442
### ICC Color Management Tiers
439443

@@ -472,10 +476,10 @@ Output devices render the display list into a final format. Each device
472476
consists of two parts that work together: a PostScript configuration file
473477
in `postforge/resources/OutputDevice/` (e.g., `png.ps`) that defines the page device
474478
dictionary, and a Python module in `postforge/devices/` that implements a
475-
`showpage(ctxt, pd)` function to perform the actual rendering. The built-in
476-
devices use a shared Cairo rendering backend, but this is a convenience, not
477-
a requirement — a custom device can use any rendering approach it wants
478-
without involving Cairo at all.
479+
`showpage(ctxt, pd)` function to perform the actual rendering. The raster
480+
devices (PNG, TIFF, Qt) use a shared Cairo rendering backend, while the PDF
481+
device generates PDF content streams directly from the display list. A custom
482+
device can use any rendering approach it wants.
479483

480484
### Device Architecture
481485

@@ -486,32 +490,43 @@ without involving Cairo at all.
486490
│ │
487491
│ cairo_renderer.py - dispatch cairo_patterns.py - patterns │
488492
│ cairo_images.py - images cairo_shading.py - shading │
489-
└─────┬────────────┬────────────┬─────────────┬────────────┬─────┘
490-
│ │ │ │ │
491-
▼ ▼ ▼ ▼ ▼
492-
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
493-
│ PNG │ │ PDF │ │ SVG │ │ TIFF │ │ Qt │
494-
│ device │ │ device │ │ device │ │ device │ │ device │
495-
└──────────┘ └────┬─────┘ └──────────┘ └──────────┘ └──────────┘
496-
497-
┌────────┴────────┐
498-
│ Font embedding │
499-
│ (font_embedder, │
500-
│ cid_font_ │
501-
│ embedder, │
502-
│ pdf_injector) │
503-
└─────────────────┘
493+
└─────────┬──────────────┬──────────────┬──────────────┬─────────┘
494+
│ │ │ │
495+
▼ ▼ ▼ ▼
496+
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
497+
│ PNG │ │ SVG │ │ TIFF │ │ Qt │
498+
│ device │ │ device │ │ device │ │ device │
499+
└──────────┘ └──────────┘ └──────────┘ └──────────┘
500+
501+
┌─────────────────────────────────────────┐
502+
│ PDF device │
503+
│ (devices/pdf/) │
504+
│ │
505+
│ pdf.py ──► content_stream.py │
506+
│ ├─ stroke_ops.py │
507+
│ ├─ text_ops.py │
508+
│ ├─ type3_ops.py │
509+
│ ├─ image_ops.py │
510+
│ └─ shading_ops.py │
511+
│ ──► pdf_builder.py │
512+
│ ├─ font_embedder.py │
513+
│ ├─ cid_font_embedder.py │
514+
│ ├─ cff_font_embedder.py │
515+
│ └─ font_tracker.py │
516+
└─────────────────────────────────────────┘
504517
```
505518

506519
**PNG** (`postforge/devices/png/png.py`) — Creates a Cairo ImageSurface, calls
507520
`render_display_list()`, writes a `.png` file. The simplest device and a good
508521
starting point for understanding the rendering pipeline.
509522

510-
**PDF** (`postforge/devices/pdf/`) — Renders to a Cairo PDFSurface, then
511-
post-processes the PDF with pypdf to inject embedded fonts. Text in PDF mode
512-
uses `TextObj` elements that are written as native PDF text operators, producing
523+
**PDF** (`postforge/devices/pdf/`) — Generates PDF content streams directly
524+
from the display list (does not use Cairo). Preserves original color spaces
525+
(CMYK, Gray, RGB) instead of converting everything to RGB. Text uses `TextObj`
526+
elements written as PDF text operators with TJ arrays and kern values, producing
513527
searchable/selectable text. Font embedding handles Type 1 reconstruction,
514-
CID/TrueType extraction, and subsetting.
528+
CID/TrueType extraction, CFF, Type 42, Type 3, and subsetting. The final PDF
529+
is assembled at document end via pypdf.
515530

516531
**SVG** (`postforge/devices/svg/svg.py`) — Renders to a Cairo SVGSurface,
517532
then post-processes the SVG to convert text from outlines to selectable `<text>`
@@ -541,8 +556,10 @@ dictionary is loaded and merged into the graphics state's `page_device`.
541556
### Shared Cairo Renderer
542557

543558
`render_display_list()` in `postforge/devices/common/cairo_renderer.py` is the
544-
main dispatch loop. It iterates over display list elements and delegates to
545-
type-specific rendering functions:
559+
main dispatch loop used by the raster and vector-surface devices (PNG, SVG,
560+
TIFF, Qt). The PDF device does not use this renderer — it generates PDF content
561+
streams directly. The Cairo renderer iterates over display list elements and
562+
delegates to type-specific rendering functions:
546563

547564
- Path construction → Cairo `move_to`, `line_to`, `curve_to`, `close_path`
548565
- Fill/Stroke → Cairo `fill` / `stroke` with color and line properties
@@ -554,10 +571,11 @@ type-specific rendering functions:
554571

555572
**Stroke method**: For bitmap devices (PNG, Qt), strokes are converted to filled
556573
paths by the interpreter before they reach the display list. This works around
557-
bugs in Cairo's stroke rasterization, particularly with dashed lines. The PDF
558-
device uses Cairo's native stroke rendering instead. This behavior is controlled
559-
per-device by the `/StrokeMethod` entry in the page device dictionary (set in
560-
each device's `.ps` configuration file).
574+
bugs in Cairo's stroke rasterization, particularly with dashed lines. Vector
575+
devices (PDF, SVG) use native stroke rendering instead — PDF emits stroke
576+
operators directly, while SVG uses Cairo's vector surface. This behavior is
577+
controlled per-device by the `/StrokeMethod` entry in the page device dictionary
578+
(set in each device's `.ps` configuration file).
561579

562580

563581
## Resource System

postforge/devices/common/cairo_renderer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
Shared Cairo Rendering Module
99
1010
This module provides the main display list dispatcher and text/glyph rendering
11-
logic used by multiple output devices (PNG, PDF, SVG, Qt).
11+
logic used by the Cairo-based output devices (PNG, SVG, TIFF, Qt).
1212
1313
Architecture:
1414
- render_display_list() is the main entry point for device implementations

postforge/devices/native_pdf/__init__.py

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)