Skip to content

Expose paraId for stable block addressing across headless editor sessions #2158

@yuch85

Description

@yuch85

Summary

sdBlockId is regenerated on every document load, making it impossible for CLI tools or stateless workflows to reference blocks across separate editor sessions. The internal node-address-resolver already prefers the DOCX-native paraId (which is stable across loads), but this stable identifier is not exposed through the document API or SDK surface.

Requesting that paraId be exposed so that headless consumers can reliably address blocks across load/unload cycles.

Problem

Any tool that loads a DOCX into a headless SuperDoc editor, extracts block references, then loads the same DOCX again to apply edits (comments, replacements, etc.) hits this:

  1. Load 1 (extract): Each paragraph gets an sdBlockId (UUID) via the BlockNode plugin (block-node.js:255-259), generated fresh with uuidv4().
  2. Load 2 (apply): The same DOCX is loaded again. All sdBlockId UUIDs are regenerated — none match Load 1.
  3. Edits referencing UUIDs from Load 1 fail silently or with "Block not found."

This affects any stateless workflow: CLI pipelines, serverless functions, multi-step agent orchestration — anything that doesn't keep a single editor instance alive for the entire operation.

The fix exists internally but isn't exposed

node-address-resolver.ts (lines 108-120) already implements the correct priority:

function resolveBlockNodeId(node: ProseMirrorNode): string | undefined {
  if (node.type.name === 'paragraph') {
    const attrs = node.attrs as ParagraphAttrs | undefined;
    // paraId (imported from DOCX) is the primary identity — it's stable across
    // document opens. sdBlockId is auto-generated per open, so using it as the
    // canonical ID would break stateless CLI workflows.
    return toId(attrs?.paraId) ?? toId(attrs?.sdBlockId);
  }
  const attrs = (node.attrs ?? {}) as BlockIdAttrs;
  return toId(attrs.blockId) ?? toId(attrs.id) ?? toId(attrs.paraId) ?? toId(attrs.uuid) ?? toId(attrs.sdBlockId);
}

paraId comes from the DOCX w14:paraId attribute (imported via w14-para-id.js) and survives round-trips because it's part of the file, not the runtime. But there is no public API for consumers to:

  1. Read paraId when enumerating blocks.
  2. Address a block by paraId when applying edits.

Discovery context

This was discovered while building superdoc-redlines, a CLI tool for AI-driven document editing that wraps the headless editor. The current workaround assigns positional sequential IDs (b001, b002, ...) based on document order and resolves those within a single session. This works but is fragile — any structural change between extract and apply breaks the ordering.

Proposed changes

Option A: Expose paraId on block node attrs (minimal)

Ensure that when a headless editor loads a DOCX:

  1. paraId is readable on node.attrs.paraId for all paragraph-type nodes (it may already be — ParagraphAttrs defines it at node-attributes.ts:194).
  2. The document API's get-node / find operations include paraId in responses.
  3. Write operations (comments, replace, etc.) accept paraId as a block address.

Option B: Add a public getStableBlockId() utility (recommended)

Expose the existing resolveBlockNodeId logic (or a wrapper) as a public export:

/**
 * Returns a stable block identifier that survives document round-trips.
 * Priority: paraId (from DOCX) > sdBlockId (runtime fallback).
 */
export function getStableBlockId(node: ProseMirrorNode): string | undefined {
  return resolveBlockNodeId(node);
}

Option C: Add paraId to document API block responses

When find or get-node returns block info, include the stable identifier:

{
  nodeId: "abc123",      // sdBlockId (volatile across loads)
  paraId: "3A2B1C0D",   // w14:paraId (stable across loads)
  nodeType: "paragraph",
}

Reproduction

import { readFile } from 'fs/promises';

const buffer = await readFile('document.docx');

// Load 1
const editor1 = await createHeadlessEditor(buffer);
const ids1 = [];
editor1.state.doc.descendants((node) => {
  if (node.attrs.sdBlockId) ids1.push(node.attrs.sdBlockId);
});
editor1.destroy();

// Load 2 — same file, new UUIDs
const editor2 = await createHeadlessEditor(buffer);
const ids2 = [];
editor2.state.doc.descendants((node) => {
  if (node.attrs.sdBlockId) ids2.push(node.attrs.sdBlockId);
});
editor2.destroy();

console.log(JSON.stringify(ids1) === JSON.stringify(ids2)); // false — UUIDs differ
// paraId is stable across both loads

Files involved

File Status
super-editor/src/extensions/block-node/block-node.js Generates fresh UUIDs on every load
super-editor/src/extensions/types/node-attributes.ts Already defines paraId: string | null
super-editor/src/core/super-converter/v3/handlers/w/p/attributes/w14-para-id.js Already imports w14:paraId from DOCX
super-editor/src/document-api-adapters/helpers/node-address-resolver.ts Already resolves paraId-first — needs public export
super-editor/src/document-api-adapters/comments-adapter.ts Uses BlockIndex which uses the resolver
document-api/src/types/info.types.ts Needs paraId in block info responses
document-api/src/get-node/get-node.ts Needs to return paraId

Suggested test plan

  1. paraId stability — Load same DOCX twice, verify paraId values identical across loads.
  2. sdBlockId instability — Same test, confirm sdBlockId values differ (documents current behavior).
  3. resolveBlockNodeId prefers paraId — Node with both attrs returns paraId.
  4. Comment by paraId — Extract paraId in session 1, apply comment in session 2, verify correct block.
  5. Existing sdBlockId workflows unaffected — Regression check.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions