Skip to content

m-houston/carbone-lambda-poc

Repository files navigation

Carbone PDF Render Lambda (POC)

Renders a DOCX template to PDF using Carbone and the Shelf LibreOffice Lambda Layer.

New: Multi‑Template & Marker Discovery

The service now supports multiple .docx templates placed in the build templates directory. A new endpoint GET /templates returns a JSON list of available templates and their discovered markers (Carbone placeholders like {d.fullName} → marker fullName).

The interactive form (GET /) loads the template list, lets you switch between them, and dynamically filters the editable field list to just the markers present in the selected template. Any marker that has no example value is auto‑initialised with a placeholder value of (( markerName )) so you can see it clearly in the UI.

When you submit a render request the chosen template name is included as template (either query param, form field, or JSON body property). If omitted the first template found (alphabetical) is used.

Example /templates response:

{
  "templates": [
    {
      "name": "letter-template-nhs-notify_",
      "file": "letter-template-nhs-notify_.docx",
      "size": 54321,
      "markers": ["fullName", "firstName", "address_line_1"]
    }
  ]
}

Add additional templates by placing more .docx files into src/modules/templates/ (or the equivalent built output directory used during packaging). Rebuild / redeploy and they will appear automatically.

Features

  • Node.js 20.x Lambda (x86_64)
  • LibreOffice provided by external layer: arn:aws:lambda:eu-west-2:764866452798:layer:libreoffice-brotli:1
  • Carbone rendering of a bundled DOCX template (templates/letter-template-nhs-notify_.docx)
  • Outputs PDF (base64) via Lambda URL (proxy style response)
  • Structured JSON logging
  • Warm-up initialization outside the handler to minimise cold-start render time
  • LibreOffice archive extraction only once per cold start (cached in container /tmp)
  • Local invocation helper with a minimal placeholder PDF (when SKIP_CONVERT=1)

LibreOffice Layer Handling

The LibreOffice Lambda layer ships a compressed archive at /opt/lo.tar.br (or /opt/lo.tar.gz). On a cold start the function:

  1. Detects whether LibreOffice is already extracted under /tmp/libreoffice/instdir/program.
  2. If not, reads and decompresses the archive (Brotli or Gzip) into /tmp/libreoffice (<=512 MB ephemeral storage).
  3. Adds the discovered instdir/program path to PATH so soffice.bin is invokable by Carbone's convert step.
  4. Logs extraction duration; subsequent warm invocations skip this step (fast path).

Local development placeholder mode (SKIP_CONVERT=1) skips the extraction entirely and returns a tiny static PDF to allow rapid iteration without the layer or native binary.

Project Structure

src/                # Lambda handler (index.ts) + modules + utils
scripts/            # build, package, local-invoke scripts
infra/              # Terraform configuration
templates/          # DOCX template included in deployment
package/            # Build output (not committed)
lambda.zip          # Deployment artifact generated by scripts/package.mjs

Prerequisites

  • Node.js 20.x
  • npm
  • Terraform >= 1.5
  • AWS credentials with permission to create IAM roles, Lambda functions, and Lambda URLs

Install Dependencies

npm install

Build & Package (locally)

npm run package   # runs clean + build + zip creation
ls -lh lambda.zip

Local Test (without LibreOffice)

This uses a tiny inline PDF generator (not real conversion) to validate the flow.

npm run build
node scripts/local-invoke.mjs '{"data":{"exampleName":"Alice"}}'
open local-output.pdf # macOS only

Deploy with Terraform

cd infra
terraform init
terraform apply -auto-approve

Outputs:

  • lambda_function_url – Invoke with curl.

Invoke Deployed Lambda

LAMBDA_URL="<paste output>"
curl -s -X POST "$LAMBDA_URL" \
  -H 'content-type: application/json' \
  -d '{"data":{"firstName":"Alice","score":42}}' \
  -o output.pdf
open output.pdf # macOS only

Handler Contract

Request (Lambda URL / Function URL invokes with standard proxy body):

{ "data": { "firstName": "Alice" } }

Response (success): HTTP 200, Content-Type: application/pdf, base64 body. Errors: JSON {"message":"..."} with 400 or 500.

Input Form (GET)

A GET request to the Lambda URL returns an HTML page (not JSON) with:

  • Current status flags (LibreOffice extracted, template present)
  • A textarea form pre-populated with sample JSON
  • A POST target that submits as application/x-www-form-urlencoded using dataJson field

Open directly in a browser:

open "$LAMBDA_URL"  # or visit in browser

Fetch raw HTML:

curl -s "$LAMBDA_URL" | head -n 20

Submitting the form opens the rendered PDF in a new tab (inline).

Default Data Fallback

POST requests with ANY of the following are treated as a request to render the template with default mock data:

  • Empty body (zero-length)
  • Whitespace-only body
  • Body that parses to JSON without a data property (e.g. {}) In these cases a default structure like:
{
  "example": "default-render",
  "generatedAt": "2025-10-08T08:00:00.000Z"
}

is passed to Carbone. Logs include defaultUsed: true for observability.

To force custom data, send a JSON body containing a data object:

curl -s -X POST "$LAMBDA_URL" \
  -H 'content-type: application/json' \
  -d '{"data":{"patientName":"Jane Doe","score":98}}' \
  -o output.pdf

Local Testing Shortcuts

HTML input form page (writes local-health.html):

npm run build
node scripts/local-invoke.mjs --get
open local-health.html  # macOS

Form POST simulation (x-www-form-urlencoded):

node scripts/local-invoke.mjs --form '{"data":{"fromForm":true,"value":42}}'
open local-output.pdf

Empty POST (default data):

node scripts/local-invoke.mjs

Explicit empty JSON (still default data):

node scripts/local-invoke.mjs '{}'

GET input form (HTML):

node scripts/local-invoke.mjs --get

Invalid JSON (expect 400):

node scripts/local-invoke.mjs '{invalid'

Environment / Performance

  • Memory: 2048 MB (per Carbone guidance for parallelism and speed)
  • Timeout: 30s (adjust if large templates or complex formatting)
  • Ephemeral storage: default (increase if larger intermediate files appear)

Notes / Trade-offs

  • Node modules installed production-only during build (no dev dependencies) for smaller artifact.
  • Carbone version pinned via semver range ^3.5.6 (latest available as of scaffold).
  • No authentication on Lambda URL (public). Add IAM or AWS_IAM / custom auth before production use.
  • A single fixed template; extend by allowing template selection via request parameter.

Extending

  • Add API Gateway HTTP API if needing custom domains / auth.
  • Add CloudWatch log metrics (parse JSON logs for latency & failures).
  • Add unit tests (e.g., using Vitest or Jest) for request parsing and error paths.
    • (Added) Jest setup with marker extraction tests (see below)
  • Implement template caching / compiled template strategy if Carbone supports it to reduce repeated parsing overhead.

Clean Up

cd infra
terraform destroy -auto-approve

Security Considerations

  • Ensure input data is validated if later exposing publicly.
  • Sanitize or restrict dynamic content to avoid injection in documents.

License

POC - internal use. Review Carbone and LibreOffice licensing for distribution compliance.

Tests (Marker Extraction)

Jest-based tests cover template marker extraction logic.

Run:

npm test

What they check:

  1. getTemplatesDir returns an absolute path.
  2. listTemplates returns structured objects with sorted, unique markers.
  3. (Conditional) At least one template produces a non-empty marker list.
  4. Cache stability via ensureTemplateInfo (same size -> same markers result).
  5. Missing template path throws.

If no .docx exists in the runtime templates directory the “non‑empty marker” and cache tests are skipped (log a warning) so CI can still pass without committing binary templates if desired.

Using AWS SSO (aws_profile)

If you use AWS SSO (IAM Identity Center) with a profile (e.g. nhs-notify-poc):

  1. Log in via AWS SSO first:
    aws sso login --profile nhs-notify-poc
  2. Either export the profile environment variable (works without changing Terraform vars):
    export AWS_PROFILE=nhs-notify-poc
    Then run:
    cd infra
    terraform apply -auto-approve
  3. Or explicitly set the Terraform variable we added:
    cd infra
    terraform apply -var="aws_profile=nhs-notify-poc" -auto-approve

Troubleshooting "No valid credential sources found":

  • Ensure you ran aws sso login recently (tokens expire, usually after 8/12 hours).
  • Confirm AWS CLI v2 is installed: aws --version.
  • Check profile config in ~/.aws/config has sso_session, sso_account_id, sso_role_name, and region.
  • You can set AWS_SDK_LOAD_CONFIG=1 to force full shared config loading:
    export AWS_SDK_LOAD_CONFIG=1
  • Run a quick permission test:
    aws sts get-caller-identity --profile nhs-notify-poc

If the above works, Terraform should also succeed with the same profile.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published