Skip to content

Deduplicate hsc2hs args #11005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

sergv
Copy link
Collaborator

@sergv sergv commented Jun 19, 2025

Synopsis

TLDR it’s all too easy to make cabal pass too many command-line arguments to hsc2hs so that it will stop working at all.

Descirption

When extra-lib-dirs and/or extra-include-dirs are specified (on either command-line or via cabal.project or cabal.config) and project has hsc2hs files that need processing, the cabal will instruct hsc2hs to use specified directories by passing them via --cflag and --ldflag arguments.

The problem is that when, say, N different directories are specified, and current cabal file depends on K packages then each directory will be passed via --ldflag or --cflag in total K times because cabal will pass it once for each package in the graph. Namely, extra-*-dirs are added to all packages in graph (either by default which is probably wrong or with some effort on the user’s side despite cabal’s best judgement).

Therefore hsc2hs receives N * K command-line arguments with lots of duplication and thes passes them to gcc. If the hsc2hs itself if recent enough then it’ll receive arguments via response file. But it will pass arguments to gcc on command line so the limit on the number of arguments will be hit there.

In either case it seems reasonable to stop passing duplicate arguments regardless of where they came from which is what this PR adresses.

I have added a somewhat big test that needs to be this way to test N * K multiplication of command-line options and total number of options has to be big enough to not fit into command-line argument length limit which is nontrivial nowadays.

The fix

The fix is to first reorder hsc2hs arguments into two groups, one that goes into --cflag and another one that goes into --ldflag and deduplicate each separately. Relative order between --cflag and --ldflag was changed but it shouldn’t matter since they’re from different "namespaces". Order within each group was preserved. One commit just reorders groups and another adds deduplication.

A smaller patch is possible, e.g. https://github.com/haskell-infra/hackage-doc-builder-config/blob/master/cabal-hsc2hs-args-patch.diff, but it will go around deduplicating among arguments that start with different prefixes every time which is going to be extra work.

Real-world significance

NB this is not a theoretical toy, it has been preventing generation of documentation on Hackage no less. It’s going to take some investigation if you want to check that out, but in brief:


Template Α: This PR modifies behaviour or interface

Include the following checklist in your PR:

@sergv sergv force-pushed the deduplicate-hsc2hs-args branch from 0ac7b37 to 86c4392 Compare June 19, 2025 22:34
@geekosaur
Copy link
Collaborator

You might do better with a test that tells hsc2hs to call a replacement for gcc that checks for duplicated arguments. Not only is it difficult to hit limits, but on some platforms it's not possible (FreeBSD, for instance, not that GitHub currently supports CI there) and there's always the possibility that Linux eventually adopts its behavior.

@sergv sergv force-pushed the deduplicate-hsc2hs-args branch from 86c4392 to 0cd5fad Compare June 19, 2025 22:46
@mpickering
Copy link
Collaborator

It seems that this is more of an issue in hsc2hs?

Cabal already passes arguments to hsc2hs using response files (#3122), but the issue is the command line length when gcc is called? Can we fix hsc2hs in the longer term to use response files when calling gcc?

Not saying this isn't a good workaround for now but good to work out what the longer term correct thing to do is.

@sergv sergv force-pushed the deduplicate-hsc2hs-args branch 18 times, most recently from b31bb26 to c3cedc9 Compare June 24, 2025 22:37
@sergv
Copy link
Collaborator Author

sergv commented Jun 26, 2025

@geekosaur I did the change of introducing custom gcc that checks for duplicate arguments. The only thing is that it’s a bit too much work to do in Windows so there the successful compilation itself will do the check.

@mpickering It does seem like a good idea to fix hsc2hs to use response files. Somehow I found out that it does use them sometimes, but not when the command-line is too long. Still, this change seems desirable because with enough dependencies there can be appalling amount of duplication which is just redundant.

@sergv sergv added attention: needs-review PR worth reviving consider reviving this PR or remove the label with a note why not and removed PR worth reviving consider reviving this PR or remove the label with a note why not labels Jun 28, 2025
sergv added 4 commits June 28, 2025 18:53
If extra-lib-dirs and/or extra-include-dirs are specified via either
command line or project files, they’re going to be added for each
package in the dendency graph. With enough long enough directories
the command-line length limits for gcc may be hit and calls to
the C compiler made from hsc2hs will start failing.

Ideally hsc2hs should be using repsonse files, but needlessly
specifying myriad of command-line parameters is redundant anyway.
@sergv sergv force-pushed the deduplicate-hsc2hs-args branch from c3cedc9 to cd7e055 Compare June 28, 2025 17:54
Copy link
Collaborator

@geekosaur geekosaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: the first diff is the actual change, the last is the changelog entry, the rest are tests with lots of files to try to trigger the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants