Skip to content

Missing doc content for word document when list after a header #2250

@tuan3w

Description

@tuan3w

Bug

Image

When a list appear after a heading, I see that it only keep header without content.
I check that it failed this check in the code:

https://github.com/docling-project/docling/blob/main/docling/backend/msword_backend.py#L1037-L1039

Steps to reproduce

Sorry I cannot share the document. But I added the screenshot of document part.
The parent as I see is a Section element so it fails the check:

SectionHeaderItem(self_ref='#/texts/25', parent=RefItem(cref='#/texts/22'), children=[], content_layer=<ContentLayer.BODY: 'body'>, label=<DocItemLabel.SECTION_HEADER: 'section_header'>, prov=[], orig='3 ĐIỀU KIỆN BẢO HÀNH', text='3 ĐIỀU KIỆN BẢO HÀNH', formatting=None, hyperlink=None, level=3)

I find that auto-creating a list group when it not satisfy check work but not sure about side effect.

Docling version

I use docling latest version 2.51

Python version

3.12

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocxissue related to docx backend

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions