Skip to content

Ensure that FilePaths don't contain interior NULs #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 23, 2023

Conversation

hasufell
Copy link
Member

@Bodigrim Bodigrim requested a review from hs-viktor April 7, 2023 18:22
@Bodigrim
Copy link
Contributor

Bodigrim commented Apr 7, 2023

Overall looks good. @vdukhovni could you please take a look?

@hasufell hasufell force-pushed the disallow-filepaths-with-nul branch from 338dc25 to fa57c3b Compare April 11, 2023 14:46
Copy link

@vdukhovni vdukhovni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in upthread comments, I think that just using useCStringLen is simpler and faster.

@hasufell hasufell force-pushed the disallow-filepaths-with-nul branch from fa57c3b to 39eb676 Compare April 12, 2023 12:45
@hasufell hasufell requested a review from vdukhovni April 12, 2023 12:45
@hasufell hasufell force-pushed the disallow-filepaths-with-nul branch from 39eb676 to 145d17b Compare April 20, 2023 10:13
@hasufell hasufell requested review from vdukhovni and Bodigrim April 20, 2023 10:13
, ioe_location = "checkForInteriorNuls"
, ioe_description = "POSIX filepaths must not contain internal NUL octets."
, ioe_errno = Nothing
, ioe_filename = Just (either (error . show) id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions are we hiding a bottom (error . show) inside ioe_filename?
I don't think that examining the error should ever blow up...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not. This function is total due to https://hackage.haskell.org/package/base-4.18.0.0/docs/GHC-IO-Encoding-Failure.html#v:TransliterateCodingFailure

This is even tested: https://github.com/haskell/filepath/blob/98f8bba9eac8c7183143d290d319be7df76c258b/tests/abstract-filepath/EncodingSpec.hs#L102

Then I think it would be more clear to return Nothing for the ioe_filename if unexpectedly that function fails anyway, and otherwise return Just the decoded filename.

either (const Nothing) Just :: Either a b -> Maybe b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Using Nothing seems to potentially mask a theoretical bug in the encoders or base library. Under the hood we use unsafePerformIO and certain low level IO primitives.

I want the code to crash, but I don't think it's worthwhile to move it into IO either.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examining fields of a thrown exception should never throw another error, no matter how unlikely that might be.
Another alternative might be to simply render the filename octets as characters with no attempt to interpret as UTF8. Then the whole issue goes away. It is not uncommon for filenames to be in some 8bit charset that isn't UTF8.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That _toStr example is also wrong IMNSHO. When we have an octet string with no hope of knowing whether it is some encoded text string, or just 8-bit data (sans NUL octets), or when it is a text string, what encoding produced it, the best we can do is render exactly one character per octet, without any "decoding".

Note also that in that context Nothing is not an option, as it is with ioe_filename. It is not OK to ever booby-trap the IOError object fields with latent bottoms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That _toStr example is also wrong IMNSHO. When we have an octet string with no hope of knowing whether it is some encoded text string, or just 8-bit data (sans NUL octets), or when it is a text string, what encoding produced it, the best we can do is render exactly one character per octet, without any "decoding".

There's nothing wrong about this, as these are user visible error messages, not proper filepaths.

I'm well aware of all the encoding intricacies of filepaths, since I've implemented the AFPP.

This is a tradeoff choice.

Copy link
Member Author

@hasufell hasufell Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another angle that I'm willing to discuss is whether ioe_filename should allow the caller to recover the actual filename or whether it's just descriptive.

In that context, converting every Word8 to a Char is very specific semantics the caller would have to know about. I'd go so far to say it's wrong on this level, since most other stuff in base uses getFileSystemEncoding. So we'd use decodeFS (which has correctness issues too, btw. and is in IO).

So, I don't think there is any right choice if we want the caller to be able to recover the filename.

If we just want nice printing, the current solution is ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not OK to ever booby-trap the IOError object fields with latent bottoms.

I agree and our tests could trivially catch this. As I pointed out, these functions are total and error is unreachable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, Bytestring modules do the infamous unpack:

throwErrnoIfMinus1_ (loc <> " '" <> BC.unpack path1 <> "' to '" <> BC.unpack path2 <> "'")

I'm rather negative to this approach and would argue the fact that this exists is due to lack of decodeFS for bytestring (which isn't an issue in OsPath modules).

However, one could make the consistency argument.

@Bodigrim
Copy link
Contributor

@vdukhovni @hs-viktor @hasufell is there any conclusion? We have to make a release soon, to accompany GHC 9.8.

@hasufell
Copy link
Member Author

hasufell commented Jul 12, 2023

We haven't come to a conclusion wrt ioe_filename and I was actually planning to consult with GHC devs about this. Or even CLC. It's not clear what the API of this record is to the end user.

Based on that, we can more easily decide what to do.

@vdukhovni
Copy link

I don't feel very strongly about encoding choices, but I am quite adamant about error values not containing bottoms.
Pick the encoding strategy that will be least surprising, and never ever have any of the fields of IOError throw when forced.

When working with paths that are ByteStrings, if they make sense as UTF8, assume they're UTF8, otherwise raw?
Assuming the default system filename encoding (as suggested) is another. No encoding choice will be always right, we simply don't know. Part of the reason the IO operation failed, could well be a poor choice of encoding... :-(

@hasufell
Copy link
Member Author

hasufell commented Jul 12, 2023

but I am quite adamant about error values not containing bottoms

Yes, as I explained above, the current code using TransliterateCodingFailure never throws.

Whatever we do, we have to document it in base, IMO: https://hackage.haskell.org/package/base-4.18.0.0/docs/GHC-IO-Exception.html#v:ioe_filename

It'll be utterly confusing to end users otherwise, looking for the properties of this record field.

hasufell added a commit to hasufell/unix that referenced this pull request Jul 17, 2023
System.Posix.ByteString.FilePath just unpacks, while
System.Posix.PosixPath.FilePath tried to be smart and
decode with UTF (without failing).

This divergence was discussed here:

* haskell#279
* haskell/core-libraries-committee#189
@Bodigrim
Copy link
Contributor

We haven't come to a conclusion wrt ioe_filename and I was actually planning to consult with GHC devs about this. Or even CLC. It's not clear what the API of this record is to the end user.

Based on that, we can more easily decide what to do.

AFAIU this is no longer a blocker, right?

@hasufell
Copy link
Member Author

Correct. I need to update the PR.

@hasufell hasufell force-pushed the disallow-filepaths-with-nul branch from 145d17b to 2455442 Compare July 23, 2023 14:42
@hasufell hasufell requested a review from Bodigrim July 23, 2023 14:42
@hasufell hasufell requested a review from vdukhovni July 23, 2023 14:42
@hasufell
Copy link
Member Author

Should be settled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants