-
Notifications
You must be signed in to change notification settings - Fork 92
Ensure that FilePaths don't contain interior NULs #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
15d5846
to
80cf9c3
Compare
80cf9c3
to
338dc25
Compare
Overall looks good. @vdukhovni could you please take a look? |
338dc25
to
fa57c3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in upthread comments, I think that just using useCStringLen
is simpler and faster.
fa57c3b
to
39eb676
Compare
39eb676
to
145d17b
Compare
System/Posix/ByteString/FilePath.hsc
Outdated
, ioe_location = "checkForInteriorNuls" | ||
, ioe_description = "POSIX filepaths must not contain internal NUL octets." | ||
, ioe_errno = Nothing | ||
, ioe_filename = Just (either (error . show) id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions are we hiding a bottom (error . show) inside ioe_filename?
I don't think that examining the error should ever blow up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not. This function is total due to https://hackage.haskell.org/package/base-4.18.0.0/docs/GHC-IO-Encoding-Failure.html#v:TransliterateCodingFailure
This is even tested: https://github.com/haskell/filepath/blob/98f8bba9eac8c7183143d290d319be7df76c258b/tests/abstract-filepath/EncodingSpec.hs#L102
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not. This function is total due to https://hackage.haskell.org/package/base-4.18.0.0/docs/GHC-IO-Encoding-Failure.html#v:TransliterateCodingFailure
This is even tested: https://github.com/haskell/filepath/blob/98f8bba9eac8c7183143d290d319be7df76c258b/tests/abstract-filepath/EncodingSpec.hs#L102
Then I think it would be more clear to return Nothing
for the ioe_filename if unexpectedly that function fails anyway, and otherwise return Just
the decoded filename.
either (const Nothing) Just :: Either a b -> Maybe b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. Using Nothing seems to potentially mask a theoretical bug in the encoders or base library. Under the hood we use unsafePerformIO and certain low level IO primitives.
I want the code to crash, but I don't think it's worthwhile to move it into IO either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examining fields of a thrown exception should never throw another error, no matter how unlikely that might be.
Another alternative might be to simply render the filename octets as characters with no attempt to interpret as UTF8. Then the whole issue goes away. It is not uncommon for filenames to be in some 8bit charset that isn't UTF8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That _toStr
example is also wrong IMNSHO. When we have an octet string with no hope of knowing whether it is some encoded text string, or just 8-bit data (sans NUL octets), or when it is a text string, what encoding produced it, the best we can do is render exactly one character per octet, without any "decoding".
Note also that in that context Nothing
is not an option, as it is with ioe_filename
. It is not OK to ever booby-trap the IOError object fields with latent bottoms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That _toStr example is also wrong IMNSHO. When we have an octet string with no hope of knowing whether it is some encoded text string, or just 8-bit data (sans NUL octets), or when it is a text string, what encoding produced it, the best we can do is render exactly one character per octet, without any "decoding".
There's nothing wrong about this, as these are user visible error messages, not proper filepaths.
I'm well aware of all the encoding intricacies of filepaths, since I've implemented the AFPP.
This is a tradeoff choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another angle that I'm willing to discuss is whether ioe_filename
should allow the caller to recover the actual filename or whether it's just descriptive.
In that context, converting every Word8 to a Char is very specific semantics the caller would have to know about. I'd go so far to say it's wrong on this level, since most other stuff in base uses getFileSystemEncoding
. So we'd use decodeFS (which has correctness issues too, btw. and is in IO).
So, I don't think there is any right choice if we want the caller to be able to recover the filename.
If we just want nice printing, the current solution is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not OK to ever booby-trap the IOError object fields with latent bottoms.
I agree and our tests could trivially catch this. As I pointed out, these functions are total and error
is unreachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, Bytestring modules do the infamous unpack:
unix/System/Posix/ByteString/FilePath.hsc
Line 133 in dc91c94
throwErrnoIfMinus1_ (loc <> " '" <> BC.unpack path1 <> "' to '" <> BC.unpack path2 <> "'") |
I'm rather negative to this approach and would argue the fact that this exists is due to lack of decodeFS
for bytestring (which isn't an issue in OsPath modules).
However, one could make the consistency argument.
@vdukhovni @hs-viktor @hasufell is there any conclusion? We have to make a release soon, to accompany GHC 9.8. |
We haven't come to a conclusion wrt Based on that, we can more easily decide what to do. |
I don't feel very strongly about encoding choices, but I am quite adamant about error values not containing bottoms. When working with paths that are ByteStrings, if they make sense as UTF8, assume they're UTF8, otherwise raw? |
Yes, as I explained above, the current code using Whatever we do, we have to document it in base, IMO: https://hackage.haskell.org/package/base-4.18.0.0/docs/GHC-IO-Exception.html#v:ioe_filename It'll be utterly confusing to end users otherwise, looking for the properties of this record field. |
System.Posix.ByteString.FilePath just unpacks, while System.Posix.PosixPath.FilePath tried to be smart and decode with UTF (without failing). This divergence was discussed here: * haskell#279 * haskell/core-libraries-committee#189
AFAIU this is no longer a blocker, right? |
Correct. I need to update the PR. |
145d17b
to
2455442
Compare
Should be settled. |
Follows:
@bgamari @Bodigrim