feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order #1104

richardye101 · 2025-03-07T19:06:23Z

Currently, markitdown parses pptx shapes in Z-order, the order in which shapes are stacked on top of each other starting from the back to the front.

There are repos that parse pptx to markdown which read the shapes in a normal reading order (top-to-bottom, left-to-right order) like https://github.com/ssine/pptx2md/blob/39bef65b312035baeade932aad8d221e37daae5f/pptx2md/parser.py#L249.

There are also stackoverflow posts that explain how to implement this code: https://stackoverflow.com/questions/51999656/how-to-extract-text-from-powerpoint-text-boxes-in-their-order-within-the-presen

I've simply copied over what @ssine has created in his repo, as it's the cleanest implementation.

Referenced from https://github.com/ssine/pptx2md/blob/39bef65b312035baeade932aad8d221e37daae5f/pptx2md/parser.py#L249

richardye101 · 2025-03-07T19:54:12Z

@microsoft-github-policy-service agree

afourney · 2025-03-07T23:21:37Z

It appears that attrgetter is not included.

afourney · 2025-03-07T23:44:51Z

Thanks! Nice and simple fix.

richardye101 added 2 commits March 7, 2025 14:02

Sort PPTX shapes to be read in top-to-bottom, left-to-right order

288a44e

Referenced from https://github.com/ssine/pptx2md/blob/39bef65b312035baeade932aad8d221e37daae5f/pptx2md/parser.py#L249

Update README.md

3ac0acb

afourney added 2 commits March 7, 2025 15:23

Fixed formatting.

a42032e

Added missing import

50231dd

afourney merged commit 0229ff6 into microsoft:main Mar 7, 2025
3 checks passed

richardye101 deleted the patch-2 branch March 8, 2025 00:49

richardye101 mentioned this pull request Mar 28, 2025

Handle PPTX shapes where position is None #1161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order #1104

feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order #1104

Uh oh!

richardye101 commented Mar 7, 2025 •

edited

Loading

Uh oh!

richardye101 commented Mar 7, 2025

Uh oh!

afourney commented Mar 7, 2025 •

edited

Loading

Uh oh!

afourney commented Mar 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order #1104

feat: sort pptx shapes to be parsed in top-to-bottom, left-to-right order #1104

Uh oh!

Conversation

richardye101 commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardye101 commented Mar 7, 2025

Uh oh!

afourney commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afourney commented Mar 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richardye101 commented Mar 7, 2025 •

edited

Loading

afourney commented Mar 7, 2025 •

edited

Loading