fix(epub): parse manifest attributes in any order by ojspace · Pull Request #5 · ojspace/md-anything

ojspace · 2026-03-21T15:59:14Z

Problem

EPUB conversion returned epub-failed for valid EPUBs (e.g., Project Gutenberg books) even when unzip was available.

Root Cause

The manifest item regex required id= to appear before href= in <item> tags:

/<item\s+[^>]*id="([^"]+)"[^>]*href="([^"]+)"[^>]*/gi

Project Gutenberg EPUBs (and many others) use href first:

<item href="chapter1.html" id="item1" media-type="application/xhtml+xml"/>

This caused zero manifest items to be found, zero spine items, and epub-failed for every valid Gutenberg EPUB.

Fix

Match the full <item> element and extract id and href separately, independent of attribute order.

Test

Alice's Adventures in Wonderland (Project Gutenberg EPUB) → extraction: epub-native, usefulness_score: 1.00, 179 chunks ✅

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Fixed EPUB manifest parsing to correctly handle variations in attribute ordering, enabling previously unrecognized e-books to be properly read.

The manifest item regex required id= before href=, but EPUB OPF files (e.g. Project Gutenberg) often place href first. This caused zero manifest items to be found, leading to epub-failed for valid EPUBs. Fix: match the full <item> element, then extract id and href separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-21T15:59:26Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66957d42-1987-484e-a535-bcab214ccc99

📥 Commits

Reviewing files that changed from the base of the PR and between 8f34a24 and 5fc698c.

📒 Files selected for processing (1)

src/providers/epub.ts

📝 Walkthrough

Walkthrough

The EPUB OPF manifest parsing logic was refactored to robustly extract <item> elements by matching the entire tag and separately extracting id and href attributes via case-insensitive regexes, accommodating any attribute ordering. The population of manifestItems is now conditional on both attributes being present.

Changes

Cohort / File(s)	Summary
EPUB Manifest Parsing `src/providers/epub.ts`	Refactored `<item>` element extraction from single sequential regex to separate case-insensitive regexes for `id` and `href`, enabling attribute order independence and improved robustness. Conditional population based on presence of both attributes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 With whiskers twitched and nose held high,
The EPUB's items now fly,
No order matters, left or right,
Our parsing's now more sturdy and bright! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/epub-manifest-attr-order

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

ojspace merged commit 943f644 into main Mar 21, 2026
3 of 4 checks passed

ojspace deleted the fix/epub-manifest-attr-order branch March 21, 2026 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(epub): parse manifest attributes in any order#5

fix(epub): parse manifest attributes in any order#5
ojspace merged 1 commit intomainfrom
fix/epub-manifest-attr-order

ojspace commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

coderabbitai bot commented Mar 21, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ojspace commented Mar 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Test

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ojspace commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading