Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File/MIME Type Parsing & File Extension association #10

Open
douglasg14b opened this issue Sep 9, 2019 · 0 comments
Open

File/MIME Type Parsing & File Extension association #10

douglasg14b opened this issue Sep 9, 2019 · 0 comments

Comments

@douglasg14b
Copy link
Member

douglasg14b commented Sep 9, 2019

Note: This is more difficult than expected....

Rough Plan/Idea

  • Figure out how file works. Reading through simplemagic source will provide a lot of context
  • Parse the magic file. Again simplemagic source will be helpful here, MimeMagicSharp also performs it's own parsing
    • Consider the format used by MimeMagicSharp. It's a JSON embodiment of the magic format

Why?

Why when there are other libraries, that you listed?

  • Nuget Packages
    • Not all libs have a nuget package. This is a must.
  • Cleaner code
    • Existing libraries are quite messy, ignoring clean code principles and language conventions
  • Better API
    • The APIs are hit or miss. I want to take their best features and extend them
  • DI/IoC Compatibility
    • Static classes are good an all, but IoC and dependency injection comparability should be baked in
  • Performance
    • Some implementations read entire files to check for type, this doesn't work with huge files

Motivation

I enjoy improving upon others work, and I enjoy writing good software that users can enjoy.

Resources

Other Libraries & tools

  • File(1) by a wide margin, the most robust source for file identification based on the magic(5) format
    • Contains the signatures for thousands of file types
  • simplemagic A Java library that mimics File(1)
    • It utilizes the magic file(s) copied from CentOS
  • magic(5)
    • The format specification, which is vital to understanding how the formats work.

Existing C# Libraries:

  • filetypedetective [Deprecated] [Unlicenced]
    • Succeeded by Mime-Detective
    • Good source for reference material
  • Mime-Detective [MIT]
    • Uses signatures and offsets from Gary Kesler
    • Performs best-effort in differentiating XLSX, PPTX and other document formats contained inside of ZIP files
    • MimeType contains the meat of how these checks are performed.
    • MimeDetective Notable for it's LearnMimeType method, the idea of which could be improved upon and put to good use.
  • MimeMagicSharp
    • Uses signatures from the Magic format.
    • Neat, multi-level, format for them
    • Don't much like how it goes about checking the signatures.

C# Code Snippets

Signature Resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant