Skip to content

[lexical-code-prism][lexical-code-shiki][lexical-playground] Feature: Allow null Tokenizer.defaultLanguage to preserve markdown ``` round-trip#8553

Merged
etrepum merged 1 commit into
facebook:mainfrom
mayrang:fix/7235-markdown-code-no-language
May 25, 2026
Merged

[lexical-code-prism][lexical-code-shiki][lexical-playground] Feature: Allow null Tokenizer.defaultLanguage to preserve markdown ``` round-trip#8553
etrepum merged 1 commit into
facebook:mainfrom
mayrang:fix/7235-markdown-code-no-language

Conversation

@mayrang
Copy link
Copy Markdown
Contributor

@mayrang mayrang commented May 24, 2026

Description

Markdown (no info string) imported into a wysiwyg editor with the Prism or Shiki highlighter active permanently rewrote the `CodeNode`'s language to the tokenizer's default ('javascript'), and the next markdown export emittedjavascript instead of preserving the round-trip.

Per the maintainer suggestion on this PR, this widens Tokenizer.defaultLanguage from string to string | null so consumers can opt out of the implicit fallback. When set to null:

  • The highlight transform leaves __language as undefined (no setLanguage mutation)
  • getIsSyntaxHighlightSupported reports false, so createDOM emits no data-language / data-highlight-language attributes
  • $tokenize returns a plain split of the text (TextNode + LineBreakNode + TabNode) instead of highlight tokens, so the code block renders as plain monospace with real line breaks — matching what GitHub / Slack / VS Code do for ``` with no language

PrismTokenizer / ShikiTokenizer keep their existing defaultLanguage: DEFAULT_CODE_LANGUAGE so default consumers see no behavior change. Playground opts in by spreading {...PrismTokenizer, defaultLanguage: null} (and the Shiki equivalent) into the CodeHighlightExtension dependency.

Playground toolbar and CodeActionMenu gained UX support for the unset state — the language dropdown shows (No language) as the label and as a selectable item that calls setLanguage(null) (which still zaps to undefined per existing semantics, but lets users return a code block to the unset state without going through markdown). CodeActionMenu falls back to the syntax-highlight default for canBePrettier / the friendly name so Prettier still works on unset blocks.

Cleanup

getDefaultCodeLanguage gained @deprecated — it is unused internally and DEFAULT_CODE_LANGUAGE can be read directly. DEFAULT_CODE_LANGUAGE is still used internally so left alone.

Backwards compatibility

Tokenizer.defaultLanguage widens from string to string | null. Existing consumers passing a string are unaffected; consumers reading tokenizer.defaultLanguage need a null guard if they handle the opt-out case. The default PrismTokenizer / ShikiTokenizer instances are unchanged, so any consumer that registers CodePrismExtension / CodeShikiExtension without overriding the tokenizer keeps the old behavior.

Closes #7235

Test plan

  • pnpm tsc --noEmit -p tsconfig.json clean.
  • pnpm flow clean.
  • pnpm vitest run --project unit — 2599 passed, 1 skipped.
  • npx prettier --check clean / pnpm eslint clean.
  • New unit tests (CodePrismNullDefaultLanguage.test.ts, CodeShikiNullDefaultLanguage.test.ts) register the highlight with defaultLanguage: null and confirm getLanguage() stays undefined after the transform runs.
  • Manual playground (Prism mode):
    • Markdown (no lang) → wysiwyg → plain monospace (no highlight tokens), toolbar shows `(No language)`; M-toggle back → markdown still.
    • Markdown python → wysiwyg → tokens + toolbar `Python`; M-toggle back → python.
    • /code slash menu → empty code block → toolbar (No language), no highlight (default opted out).
    • Toolbar (No language) item on a highlighted Python block → highlight clears, block becomes plain; markdown export ```.
    • CodeActionMenu Prettier on a (No language) block → format applied, \n rendered as real line breaks (not literal characters).
  • Same scenarios in Shiki mode.
  • Updated existing e2e snapshots (CodeBlock, CodeActionMenu, Indentation, Markdown, Tab, 1384-insert-nodes) — data-language="javascript" / data-highlight-language="javascript" no longer present when the block was created via markdown ``` shortcut.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lexical Ready Ready Preview, Comment May 25, 2026 6:46pm
lexical-playground Ready Ready Preview, Comment May 25, 2026 6:46pm

Request Review

Copy link
Copy Markdown
Collaborator

@etrepum etrepum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be done in a backwards compatible way, perhaps by allowing defaultLanguage to be explicitly null or undefined in the extension configurations?

We should probably go ahead and deprecate getDefaultCodeLanguage while we're in here since it's unused internally and doesn't really have any bearing on how things work. DEFAULT_CODE_LANGUAGE could probably also be deprecated although that is currently used internally.

@etrepum etrepum added the extended-tests Run extended e2e tests on a PR label May 24, 2026
@mayrang mayrang force-pushed the fix/7235-markdown-code-no-language branch from 7f63844 to fde46d6 Compare May 24, 2026 21:43
@mayrang mayrang changed the title [Breaking Change][lexical-code-core][lexical-code-shiki][lexical-markdown] Bug Fix: Preserve markdown ``` (no language) across round-trip [lexical-code-prism][lexical-code-shiki][lexical-playground] Feature: Allow null Tokenizer.defaultLanguage to preserve markdown ``` round-trip May 24, 2026
@mayrang
Copy link
Copy Markdown
Contributor Author

mayrang commented May 24, 2026

Switched to the Tokenizer.defaultLanguage: string | null route — default tokenizers keep their existing string, playground opts in by spreading {...PrismTokenizer, defaultLanguage: null} (and the Shiki equivalent) into CodeHighlightExtension. The earlier 3-state CodeNode.__language change is fully reverted, so no setLanguage / updateFromJSON BC impact.

getDefaultCodeLanguage is now @deprecated. Left DEFAULT_CODE_LANGUAGE as-is since it's still used internally.

Copy link
Copy Markdown
Collaborator

@etrepum etrepum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have e2e tests that explicitly set a language in the markdown blocks to make sure these highlighters are still working correctly with the playground config in those cases?

@@ -56,6 +56,7 @@ export type SerializedCodeNode = Spread<
>;

export const DEFAULT_CODE_LANGUAGE = 'javascript';
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be marked as @internal - it's not really an API and doesn't make sense when it's overridden in the configs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to @internal.

>;

export const DEFAULT_CODE_LANGUAGE = 'javascript';
/** @deprecated Read {@link DEFAULT_CODE_LANGUAGE} directly. */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to point people anywhere else, maybe just say it's configurable in the extensions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimmed the JSDoc — just says it's configurable through the extensions now.

* compatible with the indent / shift-lines handlers that only accept
* CodeHighlightNode + TabNode + LineBreakNode inside a CodeNode.
*/
function $plainifyCodeContent(text: string): LexicalNode[] {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should unify these into @lexical/code-core since we have the code repeated verbatim in each highlighter package

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved $plainifyCodeContent to @lexical/code-core and imported from both highlighter packages.

… Allow null Tokenizer.defaultLanguage to preserve markdown ``` round-trip
@mayrang
Copy link
Copy Markdown
Contributor Author

mayrang commented May 25, 2026

Yes — CodeBlock.spec.mjs l1350/l1410 type ```diff / ```diff-javascript as the markdown opener and assert the rendered data-language + highlight tokens against the playground config. The new defaultLanguage: null branch only fires when getLanguage() === undefined, so these two cases stay on the existing path.

@etrepum etrepum added this pull request to the merge queue May 25, 2026
Merged via the queue into facebook:main with commit 168f803 May 25, 2026
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. extended-tests Run extended e2e tests on a PR

Projects

None yet

2 participants