-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS specific content type handling #7418
Comments
Interesting issue! Yes, i did read ipfs/ipfs-companion#886 (comment) The original issue talks about this blog post: https://ipfs.io/ipns/blog.ipfs.io/2020-05-20-gossipsub-v1.1 That's - at the very least - a bug that should be fixed in the software that's used to make blog posts. A "dot" should not be part of the url except if it's an extension. URL's like that will break checks that are done solely on the extension. Which in this case is apparently happening. Next you have a difference of webservers and files. If a webserver serves a pages like that, the content type that the webserver also provides should be used. This is where things get wonky as when handling it through IPFS it's probably all handled as files. And if different nodes have different mime databases you might indeed get different results. But there's a fix for that :) In the IPFS world it makes a lot of sense to determine the mime type by content, not by extension. As the node that is going to respond with the data knows the data. It's practically free to determine the mime type then. So i'd advise you to look at how this is done in the Qt world and use that logic instead. Your starting point would be https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/mimetypes/qmimedatabase.cpp I don't quite know how the actual database is build but i do know that it's working quite reliable for years (since it's introduction in Qt 5.0 i think) Just as a little reminder of what is possible if you solely detect meme by extension. Do realize that on linux (windows too i think, not entirely sure) a dot is allowed to be in any entry. So you could actually have a folder called: "bigfolder.jpg" which would not be a jpg file but a folder! It's stupid.. but possible. Hope this helps :) |
While thinking about this a bit more. Why isn't the mime content type encoded in the hash? 00 = Unknown mime type (aka, try to determine it on the receiving node) This does make the hash 2 characters longer but it also gives you a way of knowing the intended mime type for a file. Also, it only has to be determined once at the point of adding the file to IPFS. You'd only have to do mime type checking if it's unknown, which gives you a nice backwards compatibility path too. Another thing to consider is that IPFS is exposing file details with this (IPFS already did with the filesize which is part of the encoding too). But with this you also know the type of file. For some purposes that might not be ideal. For other purposes (like quickly sorting on mime type or even "searching for image files" this offers a really simple and fast way to do just that. |
The issue here is simply: content detection shouldn't be OS dependent. That's it. Beyond that, we'd ideally have more accurate content detection. However, it's not simple. We don't want to treat "index.html" with the content |
Version information:
Description:
go-ipfs now reads /etc/mime.types when determining the content type of a file from the file extension. Unfortunately, this leads to hard to diagnose platform specific behavior where, ideally, all go-ipfs implementations should behave the same way.
See ipfs/ipfs-companion#886 (comment).
The text was updated successfully, but these errors were encountered: