-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Summary
sdBlockId is regenerated on every document load, making it impossible for CLI tools or stateless workflows to reference blocks across separate editor sessions. The internal node-address-resolver already prefers the DOCX-native paraId (which is stable across loads), but this stable identifier is not exposed through the document API or SDK surface.
Requesting that paraId be exposed so that headless consumers can reliably address blocks across load/unload cycles.
Problem
Any tool that loads a DOCX into a headless SuperDoc editor, extracts block references, then loads the same DOCX again to apply edits (comments, replacements, etc.) hits this:
- Load 1 (extract): Each paragraph gets an
sdBlockId(UUID) via theBlockNodeplugin (block-node.js:255-259), generated fresh withuuidv4(). - Load 2 (apply): The same DOCX is loaded again. All
sdBlockIdUUIDs are regenerated — none match Load 1. - Edits referencing UUIDs from Load 1 fail silently or with "Block not found."
This affects any stateless workflow: CLI pipelines, serverless functions, multi-step agent orchestration — anything that doesn't keep a single editor instance alive for the entire operation.
The fix exists internally but isn't exposed
node-address-resolver.ts (lines 108-120) already implements the correct priority:
function resolveBlockNodeId(node: ProseMirrorNode): string | undefined {
if (node.type.name === 'paragraph') {
const attrs = node.attrs as ParagraphAttrs | undefined;
// paraId (imported from DOCX) is the primary identity — it's stable across
// document opens. sdBlockId is auto-generated per open, so using it as the
// canonical ID would break stateless CLI workflows.
return toId(attrs?.paraId) ?? toId(attrs?.sdBlockId);
}
const attrs = (node.attrs ?? {}) as BlockIdAttrs;
return toId(attrs.blockId) ?? toId(attrs.id) ?? toId(attrs.paraId) ?? toId(attrs.uuid) ?? toId(attrs.sdBlockId);
}paraId comes from the DOCX w14:paraId attribute (imported via w14-para-id.js) and survives round-trips because it's part of the file, not the runtime. But there is no public API for consumers to:
- Read
paraIdwhen enumerating blocks. - Address a block by
paraIdwhen applying edits.
Discovery context
This was discovered while building superdoc-redlines, a CLI tool for AI-driven document editing that wraps the headless editor. The current workaround assigns positional sequential IDs (b001, b002, ...) based on document order and resolves those within a single session. This works but is fragile — any structural change between extract and apply breaks the ordering.
Proposed changes
Option A: Expose paraId on block node attrs (minimal)
Ensure that when a headless editor loads a DOCX:
paraIdis readable onnode.attrs.paraIdfor all paragraph-type nodes (it may already be —ParagraphAttrsdefines it atnode-attributes.ts:194).- The document API's
get-node/findoperations includeparaIdin responses. - Write operations (comments, replace, etc.) accept
paraIdas a block address.
Option B: Add a public getStableBlockId() utility (recommended)
Expose the existing resolveBlockNodeId logic (or a wrapper) as a public export:
/**
* Returns a stable block identifier that survives document round-trips.
* Priority: paraId (from DOCX) > sdBlockId (runtime fallback).
*/
export function getStableBlockId(node: ProseMirrorNode): string | undefined {
return resolveBlockNodeId(node);
}Option C: Add paraId to document API block responses
When find or get-node returns block info, include the stable identifier:
{
nodeId: "abc123", // sdBlockId (volatile across loads)
paraId: "3A2B1C0D", // w14:paraId (stable across loads)
nodeType: "paragraph",
}Reproduction
import { readFile } from 'fs/promises';
const buffer = await readFile('document.docx');
// Load 1
const editor1 = await createHeadlessEditor(buffer);
const ids1 = [];
editor1.state.doc.descendants((node) => {
if (node.attrs.sdBlockId) ids1.push(node.attrs.sdBlockId);
});
editor1.destroy();
// Load 2 — same file, new UUIDs
const editor2 = await createHeadlessEditor(buffer);
const ids2 = [];
editor2.state.doc.descendants((node) => {
if (node.attrs.sdBlockId) ids2.push(node.attrs.sdBlockId);
});
editor2.destroy();
console.log(JSON.stringify(ids1) === JSON.stringify(ids2)); // false — UUIDs differ
// paraId is stable across both loadsFiles involved
| File | Status |
|---|---|
super-editor/src/extensions/block-node/block-node.js |
Generates fresh UUIDs on every load |
super-editor/src/extensions/types/node-attributes.ts |
Already defines paraId: string | null |
super-editor/src/core/super-converter/v3/handlers/w/p/attributes/w14-para-id.js |
Already imports w14:paraId from DOCX |
super-editor/src/document-api-adapters/helpers/node-address-resolver.ts |
Already resolves paraId-first — needs public export |
super-editor/src/document-api-adapters/comments-adapter.ts |
Uses BlockIndex which uses the resolver |
document-api/src/types/info.types.ts |
Needs paraId in block info responses |
document-api/src/get-node/get-node.ts |
Needs to return paraId |
Suggested test plan
- paraId stability — Load same DOCX twice, verify
paraIdvalues identical across loads. - sdBlockId instability — Same test, confirm
sdBlockIdvalues differ (documents current behavior). - resolveBlockNodeId prefers paraId — Node with both attrs returns
paraId. - Comment by paraId — Extract
paraIdin session 1, apply comment in session 2, verify correct block. - Existing sdBlockId workflows unaffected — Regression check.