Improve support for copy-paste from Microsoft Word by AllanOXDi · Pull Request #5595 · learningequality/studio

AllanOXDi · 2025-12-04T19:28:23Z

Summary

This PR improves how the Studio editor handles copy-paste from Microsoft Word so that pasted content more closely matches the formatting that Studio itself supports.

References

#4325

Reviewer guidance

Copy and paste from these sample docs MS Word 2019. LibreOffice Writer v25.8.1.1. Google Doc to exercise editor and verify if:

Nested lists show correct indentation
Bullet sizes are consistent and appropriately sized
First-level bullets are discs
Second-level bullets are circles
Third-level bullets are squares
Strike-through does NOT extend into nested list items
Nested lists retain proper indentation even when strike-through is used

…essing

nucleogenesis

I'll spin it up tomorrow but overall the code looks great overall. Left a suggestion and a couple of questions in the meantime.

nucleogenesis · 2025-12-04T23:58:30Z

...entcuration/frontend/shared/views/TipTapEditor/TipTapEditor/composables/useToolbarActions.js

        }
-      } catch (err) {
-        editor.value.chain().focus().insertContent(clipboardAccessFailed$()).run();
+        return handlePasteNoFormat();


Is this call here necessary if the one 2 lines down is outside of the if block altogether (or vice-versa?)

Good catch! The inner return handlePasteNoFormat() was redundant, as the outer fallback already covers both cases. Thanks

nucleogenesis · 2025-12-05T00:06:35Z

...ntcuration/contentcuration/frontend/shared/views/TipTapEditor/TipTapEditor/utils/markdown.js

  return `$$${latex || ''}$$`;
 };

+export function sanitizePastedHTML(html) {


If this is specific to MS word then maybe putting that in the name is a good idea a la sanitizeMSWordHTML or something?

Yes, the sanitizer initially targeted MS Word, but we’ve expanded it to handle copy-paste issues from Google Docs and LibreOffice as well (e.g., strike-through bleed, nested list normalization). And since it now applies to all external HTML paste sources, renaming it to sanitizeMSWordHTML would be misleading. Keeping the more general name sanitizePastedHTML better reflects its broader use and sounds good to me.

nucleogenesis · 2025-12-05T00:14:16Z

...ntcuration/contentcuration/frontend/shared/views/TipTapEditor/TipTapEditor/utils/markdown.js

+    const items = list.querySelectorAll(':scope > li');
+    items.forEach(item => {
+      const nestedLists = Array.from(item.children).filter(
+        child => child.tagName === 'UL' || child.tagName === 'OL',


Why are these uppercase? Is that coming from the pasted text?

The uppercase comes from the DOM API itself. Browsers normalize element.tagName to uppercase for all HTML elements, regardless of how they appear in the pasted HTML. https://developer.mozilla.org/en-US/docs/Web/API/Element/tagName.
https://html.spec.whatwg.org/multipage/dom.html#htmlelement

HTML elements have an uppercase local name.

So I think checking against 'UL' and 'OL' is the correct and standard way to identify list elements in sanitized HTML

radinamatic · 2025-12-05T13:21:47Z

Looking forward to checking this out on unstable! 🎉

nucleogenesis

@radinamatic could you give this image a look and share any thoughts you have? Do you think this is worth getting onto unstable? I'm not using Microsoft Word but I am seeing inconsistencies between filetypes and where I open and copy the contents so I'm not sure we could ever really enumerate all of the various formats we might run into.

I opened the various testing files @AllanOXDi shared in the reviewer guidance and made examples of them below in the question & answer boxes of an exercise for demonstration purposes. I put which filetype it was and where I opened it (and then copied it from) as the first line in each box -- then below that is the pasted text from that filetype/program combination.

I did do the sort of "happy paths" where files were opened in places they're most compatible but also mixed and matched things like opened in a LibreOffice file in Google Docs and DOCX in LibreOffice and so forth -- although I'm not sure the acceptance criteria.

nucleogenesis

@AllanOXDi I'm sorry - I saw this and was wondering why it hadn't merged yet but I realize now it's because I failed to come back and approve it after we discussed it on Slack 😓

improve HTML paste handling with sanitization

409a8a6

AllanOXDi requested review from marcellamaki and nucleogenesis December 4, 2025 19:28

fix failing test by removing HTML sanitization from markdown preproc…

aa3e54d

…essing

nucleogenesis reviewed Dec 5, 2025

View reviewed changes

remove redundant handlePasteNoFormat calls in paste handler

8504d85

nucleogenesis reviewed Dec 5, 2025

View reviewed changes

nucleogenesis approved these changes Dec 19, 2025

View reviewed changes

AllanOXDi merged commit 5530d16 into learningequality:unstable Dec 29, 2025
13 checks passed

AllanOXDi mentioned this pull request Dec 29, 2025

Better support for copy-paste from Microsoft Word #4325

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Improve support for copy-paste from Microsoft Word#5595

Improve support for copy-paste from Microsoft Word#5595
AllanOXDi merged 3 commits intolearningequality:unstablefrom
AllanOXDi:fixformarting

AllanOXDi commented Dec 4, 2025 •

edited

Loading

Uh oh!

nucleogenesis left a comment

Uh oh!

nucleogenesis Dec 4, 2025

Uh oh!

AllanOXDi Dec 5, 2025 •

edited

Loading

Uh oh!

nucleogenesis Dec 5, 2025

Uh oh!

AllanOXDi Dec 5, 2025 •

edited

Loading

Uh oh!

nucleogenesis Dec 5, 2025

Uh oh!

AllanOXDi Dec 5, 2025

Uh oh!

radinamatic commented Dec 5, 2025 •

edited

Loading

Uh oh!

nucleogenesis left a comment •

edited

Loading

Uh oh!

nucleogenesis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

AllanOXDi commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

References

Reviewer guidance

Uh oh!

nucleogenesis left a comment

Choose a reason for hiding this comment

Uh oh!

nucleogenesis Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

AllanOXDi Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nucleogenesis Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

AllanOXDi Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nucleogenesis Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

AllanOXDi Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

radinamatic commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nucleogenesis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nucleogenesis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AllanOXDi commented Dec 4, 2025 •

edited

Loading

AllanOXDi Dec 5, 2025 •

edited

Loading

AllanOXDi Dec 5, 2025 •

edited

Loading

radinamatic commented Dec 5, 2025 •

edited

Loading

nucleogenesis left a comment •

edited

Loading