-
Notifications
You must be signed in to change notification settings - Fork 600
Supporting MuPDF file recognizer #4481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
44500cd
to
1fb4d0d
Compare
We previously used slightly different logic for opening documents from files versus from memory. This fix strives to always use MuPDF's file type recognition and thus become independent from the value of file extensions as much as possible. This works for almost all document types. Exceptions are "txt", "fb2" and "mobi", where either a valid file extension must be present, or the respective filetype must be provided.
1fb4d0d
to
2c90d6c
Compare
if not handler: | ||
raise FileDataError("Failed to open stream as {magic}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this contradict the new docs which says:
:arg str filetype: A string specifying the type of document. Will be ignored in most [#f8]_ cases for :ref:a supported document type<Supported_File_Types>
.
?
Shouldn't we should get a handler with fz_recognize_document_stream_content()
here, so that the contents overrides the supplied magic
where possible?
Or simply use fz_open_document_with_stream()
, i.e. don't create an intermediate handler? I.e. remove the entire if magic: ...
block in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recognizer seems to always detect the type except in 3 cases: "txt", "fb2", "mobi". This means we have identical behavior between file-open versus stream-open.
Only those three cases still require the correct extension, respectively filetype. Everything else disregards file extension, respectively needs no filetype-help.
The two cases "fb2" and "mobi" can be resolved with a little improved recognizer, which will hopefully happen soon.
The case "txt" remains an exception because of principle reasons.
This is the background of my new logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs side of things is fine. No typos - all good! :)
We previously used slightly different logic for opening documents from files versus from memory. This fix strives to always use MuPDF's file type recognition and thus become independent from the value of file extensions as much as possible. This works for almost all document types. Exceptions are "txt", "fb2" and "mobi", where either a valid file extension must be present, or the respective filetype must be provided.