Skip to content

Conversation

@mfts
Copy link
Owner

@mfts mfts commented Dec 30, 2025

Summary by CodeRabbit

  • New Features

    • Extended watermarking capability to support images alongside PDFs
    • Improved binary file download handling with proper content-type detection for PDFs and images
  • Improvements

    • Enhanced watermark font sizing to scale proportionally with document dimensions for better readability
  • Dependencies

    • Added image processing library support

✏️ Tip: You can customize this high-level summary in your review settings.

@vercel
Copy link

vercel bot commented Dec 30, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
papermark Error Error Dec 30, 2025 2:45pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Walkthrough

The PR adds watermarking functionality for PDF and image downloads. It introduces watermark helper utilities, SVG generation logic, and a new /api/mupdf/watermark endpoint. Download handlers are updated to detect document types and route watermarking requests through the new endpoint. UI components are refactored to handle binary downloads uniformly.

Changes

Cohort / File(s) Summary
Watermark utilities and core logic
lib/watermark/helpers.ts, lib/watermark/svg-generator.ts, lib/watermark/index.ts
Introduces new watermark module with URL validation, content-type utilities, file naming helpers, batch sizing logic, Zod schemas for request/viewer data validation, and SVG watermark generation with font sizing, positioning, text escaping, and both tiled and single-position watermark rendering.
Watermark API endpoint
pages/api/mupdf/watermark.ts, pages/api/mupdf/annotate-document.ts
Creates new /api/mupdf/watermark endpoint supporting image URL and multi-page PDF watermarking with URL validation, API key authentication, sharp-based image processing, page batching, and comprehensive error handling; marks /api/mupdf/annotate-document as deprecated.
Download API endpoints
pages/api/links/download/index.ts, pages/api/links/download/dataroom-document.ts
Reworks download handlers to detect document types, determine watermark applicability for PDFs and images, route through watermark endpoint with viewer metadata (email, date, IP, timestamp), set appropriate Content-Type and Content-Disposition headers for watermarked outputs, and handle API errors with fallback messaging.
UI components for binary download handling
components/view/dataroom/document-card.tsx, components/view/nav.tsx, components/view/viewer/download-only-viewer.tsx
Generalizes PDF-specific download paths to unified binary download handling for PDFs and images; extracts filenames from Content-Disposition headers with content-type-based defaults; creates object URLs and initiates downloads via anchor elements; falls back to JSON downloadUrl for non-binary responses.
SVG watermark styling
components/view/watermark-svg.tsx
Removes upper-bound font size clamp and implements proportional font scaling based on document dimensions with minimum 8px readability threshold.
Dependencies
package.json
Adds sharp (^0.34.5) and @types/sharp (^0.31.1) for image processing and type definitions.

Possibly related PRs

  • fix: watermark download #1923: Modifies the same download-and-watermark code paths (document-card, nav, download-only-viewer, and links download APIs) to address how watermarked PDFs and images are fetched and handled.
  • fix: watermark on partial folder downloads #1781: Modifies download endpoints to support watermarking by selecting watermarked vs. original files and including watermarkConfig and viewer metadata in request payloads.
  • fix: watermark enabled without config #1897: Modifies watermark gating and download watermarking logic in dataroom-document.ts and related handlers with enableWatermark and watermarkConfig checks.

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'feat: improve watermark' is vague and generic, using non-descriptive terms that don't convey the specific nature of the changes despite the PR involving significant refactoring of watermark handling and support for image watermarking. Consider a more specific title that highlights the main change, such as 'feat: add image watermarking support and unify binary download handling' or 'feat: extend watermarking to images and refactor download logic'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security
Copy link

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​types/​sharp@​0.31.11001007680100

View full report

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (10)
pages/api/mupdf/annotate-document.ts (1)

18-24: Deprecation notice is clear and well-documented.

The comment effectively communicates the deprecation path, reason (pixel-burned vs removable text layer watermarks), and backward compatibility guarantees. Consider whether adding a runtime deprecation header (e.g., Deprecation or Sunset) to responses would help API clients prepare for migration earlier, though the doc approach is reasonable for a gradual phase-out.

package.json (1)

83-83: Move @types/sharp to devDependencies.

Type definition packages should be in devDependencies rather than dependencies since they're only needed at compile time.

🔎 Proposed fix

Move this line from dependencies to devDependencies:

  "devDependencies": {
    "@react-email/preview-server": "^5.1.0",
    "@tailwindcss/forms": "^0.5.11",
    "@trigger.dev/build": "^3.3.17",
    "@trivago/prettier-plugin-sort-imports": "^6.0.0",
    "@types/bcryptjs": "^2.4.6",
    "@types/cookie": "^0.6.0",
    "@types/fluent-ffmpeg": "^2.1.27",
    "@types/js-cookie": "^3.0.6",
    "@types/jsonwebtoken": "^9.0.7",
    "@types/mime-types": "^3.0.1",
    "@types/ms": "^2.1.0",
    "@types/node": "^22.13.5",
    "@types/react": "^18",
    "@types/react-dom": "^18",
    "@types/sanitize-html": "^2.13.0",
+   "@types/sharp": "^0.31.1",
    "@types/ua-parser-js": "^0.7.39",
components/view/watermark-svg.tsx (1)

23-28: Consider adding an upper bound for very large documents.

Removing the upper bound clamp means the font size can grow unboundedly for large document dimensions. While proportional scaling is desirable, extremely large documents (e.g., high-resolution images or oversized PDFs) could result in excessively large watermark text that dominates the content.

Consider adding a sensible maximum to prevent visual issues:

🔎 Proposed fix
  const calculateFontSize = () => {
    const baseFontSize = Math.min(width, height) * (config.fontSize / 1000);
-   return Math.max(8, baseFontSize); // Only clamp minimum for readability
+   // Clamp between minimum for readability and maximum for visual balance
+   return Math.max(8, Math.min(baseFontSize, config.fontSize * 2));
  };
pages/api/links/download/index.ts (2)

176-192: Add timeout for internal watermark API calls.

The fetch to /api/mupdf/watermark has no timeout. For large PDFs, this could hang indefinitely. While the endpoint has a 300s maxDuration, the calling code should also implement a timeout to handle network issues or slow processing gracefully.

🔎 Proposed fix
+         const controller = new AbortController();
+         const timeoutId = setTimeout(() => controller.abort(), 290000); // 290s, slightly less than maxDuration
+
          const response = await fetch(
            `${process.env.NEXTAUTH_URL}/api/mupdf/watermark`,
            {
              method: "POST",
              headers: {
                "Content-Type": "application/json",
                Authorization: `Bearer ${process.env.INTERNAL_API_KEY}`,
              },
              body: JSON.stringify({
                documentVersionId: view.document!.versions[0].id,
                numPages: view.document!.versions[0].numPages,
                watermarkConfig: view.link.watermarkConfig,
                originalFileName: view.document!.name,
                viewerData,
              }),
+             signal: controller.signal,
            },
          );
+
+         clearTimeout(timeoutId);

162-172: Locale-dependent date/time formatting may produce inconsistent watermarks.

toLocaleDateString() and toLocaleTimeString() use the server's locale, which may vary across deployments or serverless instances. Consider using explicit formatting for consistent watermarks.

🔎 Proposed fix
        const viewerData = {
          email: view.viewerEmail,
-         date: new Date(
-           view.viewedAt ? view.viewedAt : new Date(),
-         ).toLocaleDateString(),
+         date: new Date(
+           view.viewedAt ? view.viewedAt : new Date(),
+         ).toLocaleDateString('en-US'),
          ipAddress: getIpAddress(req.headers),
          link: view.link.name,
-         time: new Date(
-           view.viewedAt ? view.viewedAt : new Date(),
-         ).toLocaleTimeString(),
+         time: new Date(
+           view.viewedAt ? view.viewedAt : new Date(),
+         ).toLocaleTimeString('en-US'),
        };
pages/api/links/download/dataroom-document.ts (2)

237-331: Significant code duplication with pages/api/links/download/index.ts.

The watermarking logic (PDF and image paths) is nearly identical between this file and index.ts. Consider extracting this into a shared utility function to reduce duplication and ensure consistent behavior.

🔎 Example shared utility signature
// lib/watermark/apply-watermark.ts
export async function applyWatermark({
  docType,
  documentVersionId,
  numPages,
  watermarkConfig,
  originalFileName,
  viewerData,
  downloadUrl,
  contentType,
}: WatermarkParams): Promise<{
  buffer: ArrayBuffer;
  contentType: string;
  filename: string;
} | { error: string }> {
  // Shared watermarking logic here
}

This would consolidate the fetch calls, error handling, and response formatting.


223-235: Same locale-dependent date/time formatting concern applies here.

The toLocaleDateString() and toLocaleTimeString() calls may produce inconsistent output across server instances.

pages/api/mupdf/watermark.ts (2)

82-93: Consider filtering by pageNumber range instead of slicing after fetch.

The current implementation fetches all DocumentPage records and then slices to the requested numPages. For documents with many pages, this could fetch unnecessary data from the database.

🔎 Proposed optimization
  // Fetch all DocumentPage records for this version
  const pages = await prisma.documentPage.findMany({
    where: { versionId: documentVersionId },
    orderBy: { pageNumber: "asc" },
+   take: numPages,
    select: { file: true, storageType: true, pageNumber: true },
  });
-
-  if (pages.length === 0) {
-    throw new Error("No pages found for document version");
-  }
-
-  // Limit pages to requested numPages
-  const pagesToProcess = pages.slice(0, numPages);
+
+  if (pages.length === 0) {
+    throw new Error("No pages found for document version");
+  }
+
+  const pagesToProcess = pages;

118-132: Remove unused originalFormat variable.

The originalFormat variable is computed but never used. It was likely intended for tracking or debugging purposes.

🔎 Proposed fix
        const buffer = Buffer.from(await response.arrayBuffer());

-       // Detect original format from file path
-       const originalFormat = getFormatFromFilePath(page.file);
-
        // Always output as PNG for consistency in PDF embedding
        const watermarkedBuffer = await watermarkImage(
          buffer,
          watermarkConfig,
          viewerData,
          "png",
        );

        return {
          buffer: watermarkedBuffer,
-         originalFormat,
          pageNumber: page.pageNumber,
        };
lib/watermark/helpers.ts (1)

123-131: Consider consolidating ViewerData definitions.

Both lib/watermark/helpers.ts (lines 123-131) and lib/watermark/svg-generator.ts (lines 4-10) define a ViewerData type with identical structure. To maintain a single source of truth, consider:

  1. Keeping only the Zod schema version in helpers.ts and exporting the inferred type
  2. Importing the type in svg-generator.ts instead of redefining the interface

This prevents potential drift between the two definitions.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efe0cee and bd6604c.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (12)
  • components/view/dataroom/document-card.tsx
  • components/view/nav.tsx
  • components/view/viewer/download-only-viewer.tsx
  • components/view/watermark-svg.tsx
  • lib/watermark/helpers.ts
  • lib/watermark/index.ts
  • lib/watermark/svg-generator.ts
  • package.json
  • pages/api/links/download/dataroom-document.ts
  • pages/api/links/download/index.ts
  • pages/api/mupdf/annotate-document.ts
  • pages/api/mupdf/watermark.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
Repo: mfts/papermark PR: 0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : You MUST `export` every task, including subtasks, in Trigger.dev task files.

Applied to files:

  • lib/watermark/index.ts
🧬 Code graph analysis (5)
pages/api/links/download/dataroom-document.ts (3)
lib/watermark/helpers.ts (4)
  • isImageType (71-76)
  • getOutputFormat (52-57)
  • getWatermarkedContentType (81-83)
  • getFileNameWithExtension (95-106)
lib/utils/ip.ts (1)
  • getIpAddress (1-28)
lib/utils.ts (1)
  • getFileNameWithPdfExtension (34-40)
pages/api/mupdf/watermark.ts (6)
pages/api/mupdf/annotate-document.ts (2)
  • config (14-16)
  • req (294-527)
lib/types.ts (1)
  • WatermarkConfig (363-363)
lib/watermark/helpers.ts (6)
  • ViewerData (131-131)
  • calculateBatchSize (111-118)
  • getFormatFromFilePath (62-66)
  • validateUrl (9-47)
  • getWatermarkedContentType (81-83)
  • getFileNameWithExtension (95-106)
lib/watermark/svg-generator.ts (3)
  • ViewerData (4-10)
  • Dimensions (12-15)
  • generateSvgWatermark (163-176)
lib/files/get-file.ts (1)
  • getFile (11-28)
lib/utils.ts (1)
  • log (88-148)
pages/api/links/download/index.ts (5)
lib/watermark/helpers.ts (4)
  • isImageType (71-76)
  • getOutputFormat (52-57)
  • getWatermarkedContentType (81-83)
  • getFileNameWithExtension (95-106)
lib/utils/ip.ts (1)
  • getIpAddress (1-28)
pages/api/mupdf/annotate-document.ts (1)
  • req (294-527)
pages/api/mupdf/get-pages.ts (1)
  • req (5-39)
lib/utils.ts (1)
  • getFileNameWithPdfExtension (34-40)
lib/watermark/helpers.ts (2)
lib/watermark/svg-generator.ts (1)
  • ViewerData (4-10)
lib/types.ts (1)
  • WatermarkConfigSchema (335-361)
lib/watermark/svg-generator.ts (3)
lib/watermark/helpers.ts (1)
  • ViewerData (131-131)
lib/types.ts (1)
  • WatermarkConfig (363-363)
lib/utils.ts (1)
  • safeTemplateReplace (781-799)
🔇 Additional comments (19)
lib/watermark/index.ts (1)

1-2: LGTM!

Clean barrel file pattern for organizing watermark utilities.

components/view/dataroom/document-card.tsx (1)

127-154: LGTM!

The generalized binary download handling correctly:

  • Detects PDFs and images via content-type inspection
  • Uses blob/object URL pattern with proper cleanup
  • Extracts filename from Content-Disposition with fallback to document name
package.json (1)

146-146: Sharp 0.34.5 is the latest available version and is compatible with Node.js >=22 (supports >=21.0.0). No active security advisories found.

pages/api/mupdf/watermark.ts (5)

1-27: LGTM!

The imports are well-organized and the 300-second timeout is appropriate for processing large PDFs with watermarking.


33-67: LGTM!

The image watermarking logic is sound. The function properly validates dimensions, generates an SVG watermark, and composites it onto the image with appropriate format-specific settings.


201-270: LGTM!

The image watermarking flow is well-implemented with:

  • Proper SSRF protection via URL validation
  • Fetch timeout (60s) with AbortController
  • Comprehensive error handling and logging
  • Correct response headers including proper filename encoding

271-306: LGTM!

The PDF watermarking flow properly validates required fields, invokes the watermarking function, and sets appropriate response headers for PDF download.


307-353: Excellent error handling with meaningful status codes.

The error handler provides:

  • Comprehensive status code mapping based on error content
  • Detailed logging with processing time
  • Automatic mentions for long-running failures (>2 minutes)
  • Structured error responses with processing time metadata

This will greatly help with debugging and monitoring.

lib/watermark/helpers.ts (5)

9-47: Excellent SSRF protection.

The URL validation implements multiple layers of security:

  • Protocol restriction (HTTPS only)
  • Hostname allowlist validation
  • Case-insensitive hostname matching
  • Clear error messages for debugging

This effectively prevents Server-Side Request Forgery attacks.


95-106: LGTM!

The filename handling correctly strips existing extensions and applies the appropriate format extension. The default filenames are sensible fallbacks.


111-118: LGTM!

The batch sizing strategy appropriately reduces batch size for larger documents, helping prevent memory exhaustion during watermarking operations.


136-190: LGTM! Well-designed request validation schemas.

The use of Zod union types with mutually exclusive fields (using z.undefined().optional()) elegantly enforces that requests are either for image or PDF watermarking, not both. The error formatting provides clear feedback for validation failures.


71-76: WebP images accepted but converted to PNG during watermarking.

The isImageType function includes "webp" as a valid image type, but getOutputFormat deliberately does not support WebP (see line 56 comment). If a WebP image is downloaded with watermarking enabled, it will silently be converted to PNG rather than failing or returning WebP. Either remove "webp" from isImageType to reject WebP images upfront, or add WebP support to the output format functions.

lib/watermark/svg-generator.ts (6)

27-35: LGTM!

The font size calculation provides responsive scaling that maintains consistent relative size across different resolutions. The minimum 8px clamp ensures readability without limiting high-resolution watermarks.


40-90: LGTM!

The position helper functions correctly handle all nine watermark positions (3x3 grid) with appropriate text anchoring and baseline alignment. The use of fontSize for edge padding provides consistent spacing.


95-102: LGTM!

The XML escaping function correctly handles all five standard XML entities in the proper order (ampersand first to prevent double-escaping).


107-132: Verify pattern text positioning.

In the tiled watermark pattern (line 127), the text is positioned at y="${patternHeight / 4}" rather than y="${patternHeight / 2}". This places the text in the upper quarter of the pattern instead of the center.

Is this intentional for visual balance, or should it be centered at patternHeight / 2?


137-158: LGTM!

The positioned watermark SVG generation correctly applies all configuration settings including position, rotation (around the text anchor point), and styling.


163-176: LGTM!

The main watermark generation function provides a clean interface that:

  • Safely interpolates viewer data using safeTemplateReplace
  • Calculates responsive font sizing
  • Delegates to appropriate SVG generator based on configuration

Comment on lines +150 to +152
const isBinaryDownload =
contentType === "application/pdf" ||
contentType?.startsWith("image/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent content-type check for PDFs.

Same issue as in download-only-viewer.tsx - using strict equality may fail if content-type includes charset parameters.

🔎 Proposed fix
      const isBinaryDownload =
-       contentType === "application/pdf" ||
+       contentType?.includes("application/pdf") ||
        contentType?.startsWith("image/");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const isBinaryDownload =
contentType === "application/pdf" ||
contentType?.startsWith("image/");
const isBinaryDownload =
contentType?.includes("application/pdf") ||
contentType?.startsWith("image/");
🤖 Prompt for AI Agents
In components/view/nav.tsx around lines 150 to 152, the content-type check uses
strict equality for PDFs which can fail when the header includes charset or
other parameters; change the logic to normalize contentType (e.g., contentType =
contentType?.toLowerCase().trim()) and check with startsWith("application/pdf")
|| contentType?.startsWith("image/") so PDFs with parameters (e.g.,
"application/pdf; charset=utf-8") are correctly detected as binary downloads.

Comment on lines +217 to +219
const isBinaryDownload =
contentType === "application/pdf" ||
contentType?.startsWith("image/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent content-type check for PDFs.

Using strict equality (===) for PDF detection differs from document-card.tsx which uses includes(). If the server includes charset in the content-type header (e.g., application/pdf; charset=utf-8), this check will fail.

🔎 Proposed fix
      const isBinaryDownload =
-       contentType === "application/pdf" ||
+       contentType?.includes("application/pdf") ||
        contentType?.startsWith("image/");
🤖 Prompt for AI Agents
In components/view/viewer/download-only-viewer.tsx around lines 217 to 219, the
PDF detection uses strict equality which fails when the Content-Type includes
parameters (e.g., "application/pdf; charset=utf-8"); change the check to a
null-safe substring match such as using contentType?.includes("application/pdf")
(or .startsWith("application/pdf")) combined with
contentType?.startsWith("image/") so PDFs with charsets are correctly detected
as binary downloads.

Comment on lines +173 to +181
// Extract the API Key from the Authorization header
const authHeader = req.headers.authorization;
const token = authHeader?.split(" ")[1];

// Check if the API Key matches
if (token !== process.env.INTERNAL_API_KEY) {
res.status(401).json({ message: "Unauthorized" });
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for INTERNAL_API_KEY references and context
rg -n -C3 'INTERNAL_API_KEY' --type=ts --type=js -g '!node_modules'

Repository: mfts/papermark

Length of output: 16509


Use constant-time comparison for INTERNAL_API_KEY authentication to prevent timing attacks.

The endpoints authenticate using simple string comparison (token !== process.env.INTERNAL_API_KEY), which is vulnerable to timing attacks. Most endpoints should use a constant-time comparison function (as demonstrated in ee/features/billing/cancellation/api/automatic-unpause-route.ts with length equality checks). Ensure the key is never logged (currently correct—only absence is logged, not the key value itself).

🤖 Prompt for AI Agents
In pages/api/mupdf/watermark.ts around lines 173 to 181, the current direct
string comparison of the extracted token to process.env.INTERNAL_API_KEY is
vulnerable to timing attacks; replace it with a constant-time comparison: first
ensure both values are present and of equal length, then compare their buffers
using a constant-time function (e.g., crypto.timingSafeEqual) to determine
equality, and if they fail, respond with 401 as before; do not log the key or
include it in any error messages and handle missing ENV or header by returning
Unauthorized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants