fix(epub): parse manifest attributes in any order#5
Conversation
The manifest item regex required id= before href=, but EPUB OPF files (e.g. Project Gutenberg) often place href first. This caused zero manifest items to be found, leading to epub-failed for valid EPUBs. Fix: match the full <item> element, then extract id and href separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe EPUB OPF manifest parsing logic was refactored to robustly extract Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable sequence diagrams in the walkthrough.Disable the |
Problem
EPUB conversion returned
epub-failedfor valid EPUBs (e.g., Project Gutenberg books) even whenunzipwas available.Root Cause
The manifest item regex required
id=to appear beforehref=in<item>tags:Project Gutenberg EPUBs (and many others) use
hreffirst:This caused zero manifest items to be found, zero spine items, and
epub-failedfor every valid Gutenberg EPUB.Fix
Match the full
<item>element and extractidandhrefseparately, independent of attribute order.Test
extraction: epub-native,usefulness_score: 1.00, 179 chunks ✅🤖 Generated with Claude Code
Summary by CodeRabbit