-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: improve watermark #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: improve watermark #1976
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughThe PR adds watermarking functionality for PDF and image downloads. It introduces watermark helper utilities, SVG generation logic, and a new Changes
Possibly related PRs
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (10)
pages/api/mupdf/annotate-document.ts (1)
18-24: Deprecation notice is clear and well-documented.The comment effectively communicates the deprecation path, reason (pixel-burned vs removable text layer watermarks), and backward compatibility guarantees. Consider whether adding a runtime deprecation header (e.g.,
DeprecationorSunset) to responses would help API clients prepare for migration earlier, though the doc approach is reasonable for a gradual phase-out.package.json (1)
83-83: Move@types/sharpto devDependencies.Type definition packages should be in
devDependenciesrather thandependenciessince they're only needed at compile time.🔎 Proposed fix
Move this line from
dependenciestodevDependencies:"devDependencies": { "@react-email/preview-server": "^5.1.0", "@tailwindcss/forms": "^0.5.11", "@trigger.dev/build": "^3.3.17", "@trivago/prettier-plugin-sort-imports": "^6.0.0", "@types/bcryptjs": "^2.4.6", "@types/cookie": "^0.6.0", "@types/fluent-ffmpeg": "^2.1.27", "@types/js-cookie": "^3.0.6", "@types/jsonwebtoken": "^9.0.7", "@types/mime-types": "^3.0.1", "@types/ms": "^2.1.0", "@types/node": "^22.13.5", "@types/react": "^18", "@types/react-dom": "^18", "@types/sanitize-html": "^2.13.0", + "@types/sharp": "^0.31.1", "@types/ua-parser-js": "^0.7.39",components/view/watermark-svg.tsx (1)
23-28: Consider adding an upper bound for very large documents.Removing the upper bound clamp means the font size can grow unboundedly for large document dimensions. While proportional scaling is desirable, extremely large documents (e.g., high-resolution images or oversized PDFs) could result in excessively large watermark text that dominates the content.
Consider adding a sensible maximum to prevent visual issues:
🔎 Proposed fix
const calculateFontSize = () => { const baseFontSize = Math.min(width, height) * (config.fontSize / 1000); - return Math.max(8, baseFontSize); // Only clamp minimum for readability + // Clamp between minimum for readability and maximum for visual balance + return Math.max(8, Math.min(baseFontSize, config.fontSize * 2)); };pages/api/links/download/index.ts (2)
176-192: Add timeout for internal watermark API calls.The fetch to
/api/mupdf/watermarkhas no timeout. For large PDFs, this could hang indefinitely. While the endpoint has a 300s maxDuration, the calling code should also implement a timeout to handle network issues or slow processing gracefully.🔎 Proposed fix
+ const controller = new AbortController(); + const timeoutId = setTimeout(() => controller.abort(), 290000); // 290s, slightly less than maxDuration + const response = await fetch( `${process.env.NEXTAUTH_URL}/api/mupdf/watermark`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${process.env.INTERNAL_API_KEY}`, }, body: JSON.stringify({ documentVersionId: view.document!.versions[0].id, numPages: view.document!.versions[0].numPages, watermarkConfig: view.link.watermarkConfig, originalFileName: view.document!.name, viewerData, }), + signal: controller.signal, }, ); + + clearTimeout(timeoutId);
162-172: Locale-dependent date/time formatting may produce inconsistent watermarks.
toLocaleDateString()andtoLocaleTimeString()use the server's locale, which may vary across deployments or serverless instances. Consider using explicit formatting for consistent watermarks.🔎 Proposed fix
const viewerData = { email: view.viewerEmail, - date: new Date( - view.viewedAt ? view.viewedAt : new Date(), - ).toLocaleDateString(), + date: new Date( + view.viewedAt ? view.viewedAt : new Date(), + ).toLocaleDateString('en-US'), ipAddress: getIpAddress(req.headers), link: view.link.name, - time: new Date( - view.viewedAt ? view.viewedAt : new Date(), - ).toLocaleTimeString(), + time: new Date( + view.viewedAt ? view.viewedAt : new Date(), + ).toLocaleTimeString('en-US'), };pages/api/links/download/dataroom-document.ts (2)
237-331: Significant code duplication withpages/api/links/download/index.ts.The watermarking logic (PDF and image paths) is nearly identical between this file and
index.ts. Consider extracting this into a shared utility function to reduce duplication and ensure consistent behavior.🔎 Example shared utility signature
// lib/watermark/apply-watermark.ts export async function applyWatermark({ docType, documentVersionId, numPages, watermarkConfig, originalFileName, viewerData, downloadUrl, contentType, }: WatermarkParams): Promise<{ buffer: ArrayBuffer; contentType: string; filename: string; } | { error: string }> { // Shared watermarking logic here }This would consolidate the fetch calls, error handling, and response formatting.
223-235: Same locale-dependent date/time formatting concern applies here.The
toLocaleDateString()andtoLocaleTimeString()calls may produce inconsistent output across server instances.pages/api/mupdf/watermark.ts (2)
82-93: Consider filtering by pageNumber range instead of slicing after fetch.The current implementation fetches all
DocumentPagerecords and then slices to the requestednumPages. For documents with many pages, this could fetch unnecessary data from the database.🔎 Proposed optimization
// Fetch all DocumentPage records for this version const pages = await prisma.documentPage.findMany({ where: { versionId: documentVersionId }, orderBy: { pageNumber: "asc" }, + take: numPages, select: { file: true, storageType: true, pageNumber: true }, }); - - if (pages.length === 0) { - throw new Error("No pages found for document version"); - } - - // Limit pages to requested numPages - const pagesToProcess = pages.slice(0, numPages); + + if (pages.length === 0) { + throw new Error("No pages found for document version"); + } + + const pagesToProcess = pages;
118-132: Remove unusedoriginalFormatvariable.The
originalFormatvariable is computed but never used. It was likely intended for tracking or debugging purposes.🔎 Proposed fix
const buffer = Buffer.from(await response.arrayBuffer()); - // Detect original format from file path - const originalFormat = getFormatFromFilePath(page.file); - // Always output as PNG for consistency in PDF embedding const watermarkedBuffer = await watermarkImage( buffer, watermarkConfig, viewerData, "png", ); return { buffer: watermarkedBuffer, - originalFormat, pageNumber: page.pageNumber, };lib/watermark/helpers.ts (1)
123-131: Consider consolidating ViewerData definitions.Both
lib/watermark/helpers.ts(lines 123-131) andlib/watermark/svg-generator.ts(lines 4-10) define aViewerDatatype with identical structure. To maintain a single source of truth, consider:
- Keeping only the Zod schema version in helpers.ts and exporting the inferred type
- Importing the type in svg-generator.ts instead of redefining the interface
This prevents potential drift between the two definitions.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (12)
components/view/dataroom/document-card.tsxcomponents/view/nav.tsxcomponents/view/viewer/download-only-viewer.tsxcomponents/view/watermark-svg.tsxlib/watermark/helpers.tslib/watermark/index.tslib/watermark/svg-generator.tspackage.jsonpages/api/links/download/dataroom-document.tspages/api/links/download/index.tspages/api/mupdf/annotate-document.tspages/api/mupdf/watermark.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-19T07:46:44.421Z
Learnt from: CR
Repo: mfts/papermark PR: 0
File: .cursor/rules/rule-trigger-typescript.mdc:0-0
Timestamp: 2025-07-19T07:46:44.421Z
Learning: Applies to **/trigger/**/*.ts : You MUST `export` every task, including subtasks, in Trigger.dev task files.
Applied to files:
lib/watermark/index.ts
🧬 Code graph analysis (5)
pages/api/links/download/dataroom-document.ts (3)
lib/watermark/helpers.ts (4)
isImageType(71-76)getOutputFormat(52-57)getWatermarkedContentType(81-83)getFileNameWithExtension(95-106)lib/utils/ip.ts (1)
getIpAddress(1-28)lib/utils.ts (1)
getFileNameWithPdfExtension(34-40)
pages/api/mupdf/watermark.ts (6)
pages/api/mupdf/annotate-document.ts (2)
config(14-16)req(294-527)lib/types.ts (1)
WatermarkConfig(363-363)lib/watermark/helpers.ts (6)
ViewerData(131-131)calculateBatchSize(111-118)getFormatFromFilePath(62-66)validateUrl(9-47)getWatermarkedContentType(81-83)getFileNameWithExtension(95-106)lib/watermark/svg-generator.ts (3)
ViewerData(4-10)Dimensions(12-15)generateSvgWatermark(163-176)lib/files/get-file.ts (1)
getFile(11-28)lib/utils.ts (1)
log(88-148)
pages/api/links/download/index.ts (5)
lib/watermark/helpers.ts (4)
isImageType(71-76)getOutputFormat(52-57)getWatermarkedContentType(81-83)getFileNameWithExtension(95-106)lib/utils/ip.ts (1)
getIpAddress(1-28)pages/api/mupdf/annotate-document.ts (1)
req(294-527)pages/api/mupdf/get-pages.ts (1)
req(5-39)lib/utils.ts (1)
getFileNameWithPdfExtension(34-40)
lib/watermark/helpers.ts (2)
lib/watermark/svg-generator.ts (1)
ViewerData(4-10)lib/types.ts (1)
WatermarkConfigSchema(335-361)
lib/watermark/svg-generator.ts (3)
lib/watermark/helpers.ts (1)
ViewerData(131-131)lib/types.ts (1)
WatermarkConfig(363-363)lib/utils.ts (1)
safeTemplateReplace(781-799)
🔇 Additional comments (19)
lib/watermark/index.ts (1)
1-2: LGTM!Clean barrel file pattern for organizing watermark utilities.
components/view/dataroom/document-card.tsx (1)
127-154: LGTM!The generalized binary download handling correctly:
- Detects PDFs and images via content-type inspection
- Uses blob/object URL pattern with proper cleanup
- Extracts filename from Content-Disposition with fallback to document name
package.json (1)
146-146: Sharp 0.34.5 is the latest available version and is compatible with Node.js >=22 (supports >=21.0.0). No active security advisories found.pages/api/mupdf/watermark.ts (5)
1-27: LGTM!The imports are well-organized and the 300-second timeout is appropriate for processing large PDFs with watermarking.
33-67: LGTM!The image watermarking logic is sound. The function properly validates dimensions, generates an SVG watermark, and composites it onto the image with appropriate format-specific settings.
201-270: LGTM!The image watermarking flow is well-implemented with:
- Proper SSRF protection via URL validation
- Fetch timeout (60s) with AbortController
- Comprehensive error handling and logging
- Correct response headers including proper filename encoding
271-306: LGTM!The PDF watermarking flow properly validates required fields, invokes the watermarking function, and sets appropriate response headers for PDF download.
307-353: Excellent error handling with meaningful status codes.The error handler provides:
- Comprehensive status code mapping based on error content
- Detailed logging with processing time
- Automatic mentions for long-running failures (>2 minutes)
- Structured error responses with processing time metadata
This will greatly help with debugging and monitoring.
lib/watermark/helpers.ts (5)
9-47: Excellent SSRF protection.The URL validation implements multiple layers of security:
- Protocol restriction (HTTPS only)
- Hostname allowlist validation
- Case-insensitive hostname matching
- Clear error messages for debugging
This effectively prevents Server-Side Request Forgery attacks.
95-106: LGTM!The filename handling correctly strips existing extensions and applies the appropriate format extension. The default filenames are sensible fallbacks.
111-118: LGTM!The batch sizing strategy appropriately reduces batch size for larger documents, helping prevent memory exhaustion during watermarking operations.
136-190: LGTM! Well-designed request validation schemas.The use of Zod union types with mutually exclusive fields (using
z.undefined().optional()) elegantly enforces that requests are either for image or PDF watermarking, not both. The error formatting provides clear feedback for validation failures.
71-76: WebP images accepted but converted to PNG during watermarking.The
isImageTypefunction includes "webp" as a valid image type, butgetOutputFormatdeliberately does not support WebP (see line 56 comment). If a WebP image is downloaded with watermarking enabled, it will silently be converted to PNG rather than failing or returning WebP. Either remove "webp" fromisImageTypeto reject WebP images upfront, or add WebP support to the output format functions.lib/watermark/svg-generator.ts (6)
27-35: LGTM!The font size calculation provides responsive scaling that maintains consistent relative size across different resolutions. The minimum 8px clamp ensures readability without limiting high-resolution watermarks.
40-90: LGTM!The position helper functions correctly handle all nine watermark positions (3x3 grid) with appropriate text anchoring and baseline alignment. The use of
fontSizefor edge padding provides consistent spacing.
95-102: LGTM!The XML escaping function correctly handles all five standard XML entities in the proper order (ampersand first to prevent double-escaping).
107-132: Verify pattern text positioning.In the tiled watermark pattern (line 127), the text is positioned at
y="${patternHeight / 4}"rather thany="${patternHeight / 2}". This places the text in the upper quarter of the pattern instead of the center.Is this intentional for visual balance, or should it be centered at
patternHeight / 2?
137-158: LGTM!The positioned watermark SVG generation correctly applies all configuration settings including position, rotation (around the text anchor point), and styling.
163-176: LGTM!The main watermark generation function provides a clean interface that:
- Safely interpolates viewer data using
safeTemplateReplace- Calculates responsive font sizing
- Delegates to appropriate SVG generator based on configuration
| const isBinaryDownload = | ||
| contentType === "application/pdf" || | ||
| contentType?.startsWith("image/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent content-type check for PDFs.
Same issue as in download-only-viewer.tsx - using strict equality may fail if content-type includes charset parameters.
🔎 Proposed fix
const isBinaryDownload =
- contentType === "application/pdf" ||
+ contentType?.includes("application/pdf") ||
contentType?.startsWith("image/");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const isBinaryDownload = | |
| contentType === "application/pdf" || | |
| contentType?.startsWith("image/"); | |
| const isBinaryDownload = | |
| contentType?.includes("application/pdf") || | |
| contentType?.startsWith("image/"); |
🤖 Prompt for AI Agents
In components/view/nav.tsx around lines 150 to 152, the content-type check uses
strict equality for PDFs which can fail when the header includes charset or
other parameters; change the logic to normalize contentType (e.g., contentType =
contentType?.toLowerCase().trim()) and check with startsWith("application/pdf")
|| contentType?.startsWith("image/") so PDFs with parameters (e.g.,
"application/pdf; charset=utf-8") are correctly detected as binary downloads.
| const isBinaryDownload = | ||
| contentType === "application/pdf" || | ||
| contentType?.startsWith("image/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent content-type check for PDFs.
Using strict equality (===) for PDF detection differs from document-card.tsx which uses includes(). If the server includes charset in the content-type header (e.g., application/pdf; charset=utf-8), this check will fail.
🔎 Proposed fix
const isBinaryDownload =
- contentType === "application/pdf" ||
+ contentType?.includes("application/pdf") ||
contentType?.startsWith("image/");🤖 Prompt for AI Agents
In components/view/viewer/download-only-viewer.tsx around lines 217 to 219, the
PDF detection uses strict equality which fails when the Content-Type includes
parameters (e.g., "application/pdf; charset=utf-8"); change the check to a
null-safe substring match such as using contentType?.includes("application/pdf")
(or .startsWith("application/pdf")) combined with
contentType?.startsWith("image/") so PDFs with charsets are correctly detected
as binary downloads.
| // Extract the API Key from the Authorization header | ||
| const authHeader = req.headers.authorization; | ||
| const token = authHeader?.split(" ")[1]; | ||
|
|
||
| // Check if the API Key matches | ||
| if (token !== process.env.INTERNAL_API_KEY) { | ||
| res.status(401).json({ message: "Unauthorized" }); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for INTERNAL_API_KEY references and context
rg -n -C3 'INTERNAL_API_KEY' --type=ts --type=js -g '!node_modules'Repository: mfts/papermark
Length of output: 16509
Use constant-time comparison for INTERNAL_API_KEY authentication to prevent timing attacks.
The endpoints authenticate using simple string comparison (token !== process.env.INTERNAL_API_KEY), which is vulnerable to timing attacks. Most endpoints should use a constant-time comparison function (as demonstrated in ee/features/billing/cancellation/api/automatic-unpause-route.ts with length equality checks). Ensure the key is never logged (currently correct—only absence is logged, not the key value itself).
🤖 Prompt for AI Agents
In pages/api/mupdf/watermark.ts around lines 173 to 181, the current direct
string comparison of the extracted token to process.env.INTERNAL_API_KEY is
vulnerable to timing attacks; replace it with a constant-time comparison: first
ensure both values are present and of equal length, then compare their buffers
using a constant-time function (e.g., crypto.timingSafeEqual) to determine
equality, and if they fail, respond with 401 as before; do not log the key or
include it in any error messages and handle missing ENV or header by returning
Unauthorized.
Summary by CodeRabbit
New Features
Improvements
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.