Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 33 additions & 15 deletions MarkdownToPdf/Converters/InlineConverters/InlineConverter.cs
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ private void AddInlineImage(LinkInline lnk)
{
path += Path.DirectorySeparatorChar;
}
path += lnk.Url;
path += System.Uri.UnescapeDataString(lnk.Url);

var img = OutputParagraph.AddImage(path);
if (img == null) return;
Expand Down Expand Up @@ -387,6 +387,38 @@ private void AddInlineImage(LinkInline lnk)
Parent.Owner.OnWarningIssued(this, "Dimension", e.Message + $", line {lnk.Line}");
}

// Auto-scale large images to fit the page width
try
{
using (var image = SixLabors.ImageSharp.Image.Load(path))
{
double hRes, vRes;
if (img.Resolution != 0)
{
hRes = img.Resolution;
vRes = img.Resolution;
}
else
{
hRes = image.Metadata.HorizontalResolution > 0 ? image.Metadata.HorizontalResolution : 96.0;
vRes = image.Metadata.VerticalResolution > 0 ? image.Metadata.VerticalResolution : 96.0;
}

var currentWidth = img.Width.IsEmpty ? (double)image.Width * 72.0 / hRes : img.Width.Point;

if (currentWidth > Parent.Width)
{
img.Width = Unit.FromPoint(Parent.Width);
var aspect = ((double)image.Width / hRes) / ((double)image.Height / vRes);
img.Height = Unit.FromPoint(Parent.Width / aspect);
}
}
}
catch (Exception ex)
{
Parent.Owner.OnWarningIssued(this, "Image", "Error checking/scaling image size: " + ex.Message);
}

if (MarkdigTreeHelper.IsOnlyBlockElement(lnk))
{
var align = Attributes.ContainsKey("align") ? Attributes["align"] : "";
Expand Down Expand Up @@ -429,20 +461,6 @@ private void AddInlineImage(LinkInline lnk)
}
}
}

//using (System.Drawing.Image bitmapImage = System.Drawing.Image.FromFile(path))
//{
// if (!img.Width.IsEmpty)
// {
// var aspect = (double)bitmapImage.Width / bitmapImage.HorizontalResolution / ((double)bitmapImage.Height / bitmapImage.VerticalResolution);

// OutputParagraph.Format?.SpaceAfter -= img.Width / aspect;
// }
// else
// {
// OutputParagraph.Format?.SpaceAfter -= Unit.FromInch((double)bitmapImage.Height / bitmapImage.VerticalResolution);
// }
//}
}
catch (OutOfMemoryException) { Parent.Owner.OnWarningIssued(this, "Image", "Read error: " + lnk.Url); }
catch (FileNotFoundException) { Parent.Owner.OnWarningIssued(this, "Image", "FileNotFound: " + lnk.Url); }
Expand Down
Binary file added MarkdownToPdf/MarkdownToPdf diagram.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 78 additions & 0 deletions MarkdownToPdf/PROJECT_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Project Summary: MarkdownToPdf

## Overview
**MarkdownToPdf** is a .NET library designed to convert Markdown documents into high-quality PDF files. It orchestrates a pipeline that parses Markdown text, transforms the structure into a document model, applies sophisticated styling, and renders the final output.

**Author:** Geert-Jan Thomas (VectorAi)
**Version:** 1.0.0
**Framework:** .NET 10.0

## Architecture & Workflow
The project follows a linear pipeline architecture, transforming data through distinct stages:

1. **Input**: Raw Markdown text.
2. **Parsing (Markdig)**: The text is parsed into an Abstract Syntax Tree (AST).
3. **Conversion (Internal)**: The AST is traversed by a custom converter system that maps Markdown elements to an intermediate document model.
4. **Styling**: A CSS-like styling engine applies visual properties (fonts, margins, borders) to the document model based on element types and attributes.
5. **Rendering (MigraDoc)**: The fully styled document model is rendered into a PDF file.

## Architecture & Workflow
The project follows a linear pipeline architecture, transforming data through distinct stages:

1. **Input**: Raw Markdown text.
2. **Parsing (Markdig)**: The text is parsed into an Abstract Syntax Tree (AST).
3. **Conversion (Internal)**: The AST is traversed by a custom converter system that maps Markdown elements to an intermediate document model.
4. **Styling**: A CSS-like styling engine applies visual properties (fonts, margins, borders) to the document model based on element types and attributes.
5. **Rendering (MigraDoc)**: The fully styled document model is rendered into a PDF file.

![MarkdownToPdf Diagram](MarkdownToPdf%20diagram.jpeg)

## Core Components Breakdown

### 1. Markdown Parsing (Markdig)
The **Markdig** library (v0.44.0) is the foundational component for understanding input content. It is not merely a dependency but the primary **lexer and parser** for the system.
* **Role**: It processes raw Markdown text and generates a structured **Abstract Syntax Tree (AST)**.
* **AST Integration**: The AST serves as the "source of truth" for the document structure. The project's `MarkdownToPdf` class receives this AST and initiates the conversion process.
* **Deep Integration**: The project includes specialized utilities (`Utils/MarkdigTreeHelper.cs`, `Utils/MarkdigExtensions.cs`) to efficiently traverse and query the Markdig AST, handling both Block and Inline syntax elements.

### 2. PDF Generation (PDFsharp & MigraDoc)
The **PDFsharp-MigraDoc** library (v6.2.3) provides the document object model (DOM) and the rendering engine. It functions in two distinct layers:
* **MigraDoc (Layout Engine)**:
* **Role**: Acts as the high-level document builder. The project converts Markdown elements into MigraDoc objects (e.g., `Document`, `Section`, `Paragraph`, `Table`).
* **Abstraction**: The `MigrDoc/` directory (e.g., `MigraDocBlockContainer`) wraps these native objects. This allows converters to interact with a unified "Container" interface, abstracting away the complexity of whether they are adding content to a page, a table cell, or a header.
* **Orchestration**: `MarkdownToPdf.cs` manages the root `MigraDocument`, handling page setup, headers/footers, and section creation.
* **PDFsharp (Renderer)**:
* **Role**: The low-level engine that takes the abstract MigraDoc DOM and performs the physical rendering.
* **Process**: The `Render()` method in `MarkdownToPdf.cs` utilizes `PdfDocumentRenderer` to calculate layout (line breaks, pagination) and draw the content.
* **Output**: It generates the final binary PDF stream or file via `pdfRenderer.PdfDocument.Save()`.

### 3. Conversion Engine
Located in `Converters/`, this system acts as the bridge between the Markdig AST and the MigraDoc DOM.
* **`RootBlockConvertor`**: The entry point that starts the recursive traversal of the AST.
* **Specialized Converters**: Individual classes handle specific AST nodes:
* **Block Converters**: `ParagraphBlockConverter`, `TableBlockConverter`, `ListConverter`, etc.
* **Inline Converters**: `InlineConverter` handles text formatting like bold, italic, and links.

### 4. Styling Engine
Located in `Styling/`, this engine allows for robust, CSS-like customization of the PDF output without hardcoding styles in the converters.
* **`StyleManager.cs`**: The core logic that resolves styles. It matches "selectors" (based on element type, class, or state) to styling definitions.
* **Style Definitions**: Classes like `ParagraphStyle`, `BorderStyle`, and `FontStyle` define the available visual properties.

## Extensibility (Plugins)
The library supports plugins via the `Plugins/` directory to extend functionality beyond standard Markdown:
* **`IHighlightingPlugin`**: Interfaces for adding syntax highlighting to code blocks.
* **`IImagePlugin`**: Interfaces for custom image handling, allowing for dynamic image generation or processing.

## Key Files & Structure
* `MarkdownToPdf.cs`: Main entry point, orchestrator, and PDF rendering trigger.
* `Converters/`: Logic mapping Markdig AST nodes to MigraDoc elements.
* `Styling/`: Style definitions and the resolution engine.
* `MigrDoc/`: Abstraction wrappers for MigraDoc objects.
* `Plugins/`: Interfaces for the plugin system.
* `Utils/`: Helper methods, primarily for Markdig AST manipulation.

## Dependencies
- **Markdig (0.44.0)**: Markdown parsing and AST generation.
- **PDFsharp-MigraDoc (6.2.3)**: Document layout and PDF rendering.
- **SixLabors.ImageSharp (3.1.12)**: Image processing support.
- **System.Drawing.Common (10.0.1)**: GDI+ graphics functionality.