Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Base16 hash for script path. #9459

Merged
merged 1 commit into from
Nov 26, 2023

Conversation

jasagredo
Copy link
Collaborator

@jasagredo jasagredo commented Nov 18, 2023

Issue #9334 shows that % characters on Windows result in invalid paths, also / characters on Linux create invalid paths.

This changes from using base64 to using base16 with the same length we use for unit-ids.

This supersedes #9452 or rather makes it not needed.

QA notes

Building scripts will now use base16 hashes. So doing cabal run <script> --verbose should output something like the following:

$ cabal run --verbose ./main.hs
Warning: this is a debug build of cabal-install with assertions enabled.
Project settings changed, reconfiguring...
creating C:\msys64\tmp\cabal-repl.-9560\dist-newstyle
creating C:\msys64\tmp\cabal-repl.-9560\dist-newstyle\cache
creating
C:\cabal\script-builds\4b00d58dd930cd841d1a706ed4b329caf686239fc5259400bd20
creating
C:\cabal\script-builds\4b00d58dd930cd841d1a706ed4b329caf686239fc5259400bd20
creating
C:\cabal\script-builds\4b00d58dd930cd841d1a706ed4b329caf686239fc5259400bd20\cache
creating
C:\cabal\script-builds\4b00d58dd930cd841d1a706ed4b329caf686239fc5259400bd20\cache
creating
C:\cabal\script-builds\4b00d58dd930cd841d1a706ed4b329caf686239fc5259400bd20\build\x86_64-windows\ghc-9.6.3\fake-package-0\x\script-main.hs\bin

The hash is a function of the canonical path, so changing the contents of the file shall not change the hash, but changing the location should.

Template Α: This PR modifies cabal behaviour

Include the following checklist in your PR:

Copy link
Collaborator

@geekosaur geekosaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you given some thought to testing? Otherwise, it's okay.

@jasagredo
Copy link
Collaborator Author

I was indeed wondering if I should add some tests, but I guess by reverting this to doing the same as we do with unit-ids this "can't go wrong"? perhaps I'm too new here to know exactly whether this should be tested or it is simple enough to not do so.

@geekosaur
Copy link
Collaborator

I've been thinking about it and am not sure that it's reasonably testable.

I am however also unsure that changing the hash to operate on the file contents instead of its path is a good idea; it seems to me that this could build up quite a lot of cached script builds for which we have no garbage collection mechanism.

@jasagredo
Copy link
Collaborator Author

That is true. I mainly followed the indications on this comment #9334 (comment)

@geekosaur
Copy link
Collaborator

Mm, I should have commented about the path over there, then. I am still worried about excessive cached garbage.

@jasagredo
Copy link
Collaborator Author

It's fine, we can continue the discussion here 👍 I think the argument that disk storage will just go up with no garbage collection makes sense, and that is alleviated somehow by using file paths (you usually don't move scripts often).

@geekosaur
Copy link
Collaborator

Right, I'm expecting it to be more common to edit a script than to move it. And I'm not sure @mpickering's CI and machine arguments make any sense for cabal scripts, as opposed to projects.

@geekosaur
Copy link
Collaborator

It looks like the validate run contains script hashes and the output needs to be updated. This is usually done with validate.sh --accept locally AIUI, then committing the updated test output files.

@Mikolaj
Copy link
Member

Mikolaj commented Nov 20, 2023

If the PR breaks tests, this means the feature is, in a way, tested already. Carefully reviewing the breakage and accepting new test results then counts as testing. Having said that, more tests is more better. E.g., if the spec of base16 did not specify that no % nor / (nor anything that Windows rejects in a path) can appear in the output (does it specify so? link?), we could add a Quickcheck test that verifies % does not appear. Are there any other tests in this spirit that we could add?

@Mikolaj
Copy link
Member

Mikolaj commented Nov 20, 2023

BTW, isn't #9452 adding a test for that? In that case, adding a test specific to this PR only makes sense if one can devise something extra useful knowing the PR implementation details, as opposed to just the issue it solves. Unlikely.

@jasagredo
Copy link
Collaborator Author

jasagredo commented Nov 20, 2023

The RFC already defines the set of US-ASCII characters used for base16 (hex) https://datatracker.ietf.org/doc/html/rfc4648#section-8 so no % or / can appear. And base16-bytestring declares it is compliant with said RFC.

About #9452 I think it was testing the previous implementation, that replaced / for %s but without fixing it on Windows.

@Mikolaj
Copy link
Member

Mikolaj commented Nov 20, 2023

The RFC already defines the set of US-ASCII characters used for base16 (hex) https://datatracker.ietf.org/doc/html/rfc4648#section-8 so no % or / can appear. And base16-bytestring declares it is compliant with said RFC.

Oh, great. As @ffaf1 and @ulysses4ever mention, specs should not be trusted, but in this case I'm satisfied; my paranoia doesn't get triggered. :)

About #9452 I think it was testing the previous implementation, that replaced / for %s but without fixing it on Windows.

I think @mpickering was trying to reproduce the bug (using the buggy implementation, obviously) which your PR fixes, so that's relevant. Am I missing the point? Given a better understanding of the bug and the nature of the fix, can we improve #9452?

@michaelpj
Copy link
Collaborator

Seems like this should probably have a comment explaining the choice of hash function?

Copy link
Collaborator

@andreabedini andreabedini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@Mikolaj
Copy link
Member

Mikolaj commented Nov 23, 2023

This is quite urgent, so let's maybe add the merge label? @Kleidukos: any last thoughts?

@jasagredo
Copy link
Collaborator Author

jasagredo commented Nov 23, 2023

We could solve two problems in one go, if we modify the + that Andrea found in Backpack. It is a one-liner.

modified   Cabal-syntax/src/Distribution/Backpack.hs
@@ -150,7 +150,7 @@ mkDefUnitId :: ComponentId -> Map ModuleName Module -> DefUnitId
 mkDefUnitId cid insts =
   unsafeMkDefUnitId
     ( mkUnitId
-        (unComponentId cid ++ maybe "" ("+" ++) (hashModuleSubst insts))
+        (unComponentId cid ++ maybe "" ("_" ++) (hashModuleSubst insts))
     )
 
 -- impose invariant!

@Kleidukos
Copy link
Member

If @jasagredo is confident about solving the aforementioned problem, let's ship it

@jasagredo
Copy link
Collaborator Author

I'd like to first confirm if the + is the problem. But I will be able to check that in ~4 hours

@mpickering
Copy link
Collaborator

We could solve two problems in one go, if we modify the + that Andrea found in Backpack. It is a one-liner.

modified   Cabal-syntax/src/Distribution/Backpack.hs
@@ -150,7 +150,7 @@ mkDefUnitId :: ComponentId -> Map ModuleName Module -> DefUnitId
 mkDefUnitId cid insts =
   unsafeMkDefUnitId
     ( mkUnitId
-        (unComponentId cid ++ maybe "" ("+" ++) (hashModuleSubst insts))
+        (unComponentId cid ++ maybe "" ("_" ++) (hashModuleSubst insts))
     )
 
 -- impose invariant!

This seems an unrelated problem to the one solved in this MR, so I think it should be part of a separate MR, ticket etc.

@jasagredo
Copy link
Collaborator Author

Sounds reasonable too. We can merge this one and I can open a separate one later.

@Kleidukos
Copy link
Member

Perfect. @jasagredo I'll let you set the merge label :)

@jasagredo jasagredo added squash+merge me Tell Mergify Bot to squash-merge merge me Tell Mergify Bot to merge and removed squash+merge me Tell Mergify Bot to squash-merge labels Nov 23, 2023
@andreabedini
Copy link
Collaborator

A file path wrapper of some sort that prevents us from generating bogus paths does not sound like a bad idea. Otherwise we will find ourselves having to double check every string we create.

@mergify mergify bot added the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Nov 26, 2023
Issue haskell#9334 shows that `%` characters on Windows result in invalid
paths, also `/` characters on Linux create invalid paths.

This changes from using base64 to using base16 with the same length
we use for unit-ids.
@mergify mergify bot merged commit dea70fb into haskell:master Nov 26, 2023
46 checks passed
@Kleidukos
Copy link
Member

Fantastic, thanks again @jasagredo! 🎉

@Mikolaj
Copy link
Member

Mikolaj commented Nov 27, 2023

A file path wrapper of some sort that prevents us from generating bogus paths does not sound like a bad idea. Otherwise we will find ourselves having to double check every string we create.

@andreabedini: would you open a ticket for that? I remember some old discussions on the channel, but I don't know if a ticket emerged from that. Also, recently @geekosaur looked into this, but I'm not sure how relevant the tickets were.

@Kleidukos: I'm assuming we are backporting this? Let me set the label.

@jasagredo: does it need any other PRs to backport together with or is it standalone? If the latter, could you comment <at>mergify backport 3.10?

@geekosaur
Copy link
Collaborator

I was looking at the QuickCheck instances

@jasagredo jasagredo deleted the js/use-base16-for-scripts branch November 27, 2023 22:14
@jasagredo
Copy link
Collaborator Author

Confirmed working as expected on cabal-install master in Windows.

@jasagredo jasagredo added the tested-on: windows QA has been successful on Windows label Nov 27, 2023
@geekosaur
Copy link
Collaborator

Sorry for the abbreviated message; I was called into the doctor's office early.

As I said, I was looking into the QuickCheck instances in Cabal-tests. I still don't know what to do with them without being quite invasive, because most things end up using the Arbitrary instance for String which isn't even safe on POSIX, much less Windows. I'm not sure how that relates to this directly.

@Kleidukos
Copy link
Member

@mergify backport 3.10

Copy link
Contributor

mergify bot commented Nov 29, 2023

backport 3.10

✅ Backports have been created

@jasagredo
Copy link
Collaborator Author

Ah sorry @Mikolaj I was busy with some other stuff lately so I forgot about this. Thanks @Kleidukos for taking care of mergify here!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attention: needs-manual-qa PR is destined for manual QA merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days merge me Tell Mergify Bot to merge platform: windows tested-on: windows QA has been successful on Windows
Projects
Status: To Test
Development

Successfully merging this pull request may close these issues.

7 participants