Skip to content

feat(docx): add track changes (w:ins/w:del) support#3579

Open
a-huk wants to merge 5 commits into
docling-project:mainfrom
a-huk:feat/docx-track-changes
Open

feat(docx): add track changes (w:ins/w:del) support#3579
a-huk wants to merge 5 commits into
docling-project:mainfrom
a-huk:feat/docx-track-changes

Conversation

@a-huk

@a-huk a-huk commented Jun 10, 2026

Copy link
Copy Markdown

Add support for Word's Track Changes feature (also called Suggestions in
newer Word/Office 365 versions). Previously, inserted text (w:ins) and
deleted text (w:del) were silently dropped during DOCX conversion,
causing content loss.

New MsWordBackendOptions.track_changes field controls the behaviour:

  • "accept" (default): include insertions, drop deletions — the final accepted document
  • "reject": drop insertions, include deletions — the original document
  • "raw": include both; insertions get underline formatting, deletions get strikethrough

Also exposed as --docx-track-changes CLI flag.

Issue resolved by this Pull Request:
Resolves #3152
Resolves #745

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

DCO Check Failed

Hi @a-huk, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for Adam Huk <huk.adam.g@gmail.com>

I, Adam Huk <huk.adam.g@gmail.com>, hereby add my Signed-off-by to this commit: 83b2b51eccf18478607f9914558138f14639dcd3
I, Adam Huk <huk.adam.g@gmail.com>, hereby add my Signed-off-by to this commit: acaa92f7caf15ad12dbb82f30babf60293caf483
I, a-huk <huk.adam.g@gmail.com>, hereby add my Signed-off-by to this commit: 4c62fdaa576317765ebabedf589cc0ce435f19c3
I, a-huk <huk.adam.g@gmail.com>, hereby add my Signed-off-by to this commit: 03bc900250e93cccb870ec33eb401d7a91b09301
I, a-huk <huk.adam.g@gmail.com>, hereby add my Signed-off-by to this commit: f32bea3a13f36ed0de3ef6e297fc33b97f4df6ba"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

@mergify

mergify Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

🔴 1 of 2 protections blocking · waiting on 👀 reviews

Protection Waiting on
🔴 Require two reviewer for test updates 👀 reviews
🟢 Enforce conventional commit

🔴 Require two reviewer for test updates

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@PeterStaar-IBM PeterStaar-IBM requested a review from ceberam June 10, 2026 16:28
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 10.86957% with 41 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
docling/backend/msword_backend.py 4.65% 41 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ceberam ceberam left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @a-huk for your interest in Docling and your contribution!

Before I dive in, should we nail down a few high-level details? I have the impression that we should first agree on the feature we want to support before code can be reviewed, since this is a completely new feature in Docling. By front-loading the design discussion we can save time and effort (you and the reviewers) and ensure that any code change are intentional, scoped, and aligned with the feature goals. Please, see the thread on the original issue and feel free to add your comments.

Thanks again for your work! 🎉

@PeterStaar-IBM

Copy link
Copy Markdown
Member

@ceberam @a-huk Can we get this PR moving forward:

  1. did we agree on the points in issue and how to tackle them?
  2. Can we resolve the merge conflict and get this feature in Docling, it looks very promising!

@a-huk

a-huk commented Jun 29, 2026

Copy link
Copy Markdown
Author

The change_type field needs to be added to TextItem in docling-core. I've temporarily patched the local venv's site-packages to add it (for testing), but this requires a separate docling-core PR.

a-huk added 5 commits June 29, 2026 15:53
Word's Track Changes feature (also called Suggestions in newer Word
versions) wraps inserted text in <w:ins> and deleted text in <w:del>
elements. Previously both were silently dropped, causing content loss.

New MsWordBackendOptions.track_changes field controls behaviour:
- "accept" (default): include insertions, drop deletions — final document
- "reject": drop insertions, include deletions — original document
- "raw": include both; insertions get underline formatting, deletions
  get strikethrough so they are visually distinguishable

Exposed via --docx-track-changes CLI flag (default: accept).

Fixes: docling-project#3152, docling-project#745
… TextItem

In raw mode, tracked insertions/deletions now set change_type='inserted' or
change_type='deleted' on TextItem rather than injecting underline/strikethrough
formatting, keeping semantic meaning separate from visual presentation.

Requires a matching docling-core change to add change_type to TextItem.
The upstream added a recursive child-expander that treated w:ins as a
transparent container, causing its runs to bypass the track-changes
handler and always appear in the output regardless of mode.

Remove "ins" from the transparent-container set so the existing
w:ins / w:del logic sees the element and can filter or annotate it.
@a-huk a-huk force-pushed the feat/docx-track-changes branch from 21fd102 to f32bea3 Compare June 29, 2026 14:02
@a-huk

a-huk commented Jun 29, 2026

Copy link
Copy Markdown
Author

@PeterStaar-IBM let me know what you think of it now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request for ``Track Changes'' in MS docx/doc Text blocks containing tracked changes (revisions) in DOCX documents are not being read properly.

3 participants