-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Feature description
Currently, dumper fails on images with large attachments, likely because the contents are parsed into the DOM via minidom which creates very large documents that can become larger than the available memory, despite the file sizes being quite small.
Feature motivation
I have a series of notes from college that are gigabytes in size, which is almost entirely in attachments. However, I also have a few files of ~80MB which are mostly attachments also failing. I could export smaller notes, however, having a Python script (to handle files larger than a certain size) or similar to process each of the attachments would be very useful, and export the modified ENEX files (as well as adding support for other formats) and the attachments would be very helpful, and would solve many of the issues of large files.
If there's any interest, I'd be more than happy to provide it, under any license desired (including public domain, so you can do whatever you wish). Currently I only have access to Evernote ENEX files, but I could also use the test cases above to add support for other note file types.
I've got a simple version of the script here, and it reduces a 7MB file to ~150KB, while keeping everything but the resources present, and can then be processed by dumper.
Current issues with the script:
- Assumes base64 encoding of attachments.
- Only handles filename and contents.
XML output currently doesn't preserve CDATA sections, which should be trivial to implement using text processing rather than dumping the tree.- Only supports Evernote ENEX files.
Doesn't store the attachment data when exporting, meaning the attachment names are lost in the header.
I'm working on fixing the 1), 3), and 5) currently, and would be more than willing to implement 4) if desired. 5) has been fixed by writing out the buffer to file, and creating a unique attachment (b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09'), since empty files are ignored by dumper, so any library users would need to know about this.