Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the broken --strict option to BuildPackages.sh, but only if --parallel is not also specified #4386

Merged
merged 2 commits into from
Apr 19, 2021

Conversation

wilfwilson
Copy link
Member

See #4384.

The addition of the --parallel option to the bin/BuildPackages.sh script in #3922 unfortunately broke the operation of the --strict option. This is because the compilation of each package was put into a background process, and its exit code was no longer being checked.

This restores the functionality of --strict when --parallel is not used. I am not sure how to catch the error codes
of parallel background processes – so, if would like help with this, please.

Text for release notes

Restore the broken --strict option to BuildPackages.sh, but only if --parallel is not also specified.

(End of text for release notes)

@wilfwilson wilfwilson added kind: bug Issues describing general bugs, and PRs fixing them release notes: to be added PRs introducing changes that should be (but have not yet been) mentioned in the release notes labels Apr 9, 2021
@wilfwilson
Copy link
Member Author

Evidence that this works is given by the failing macOS job now actually failing at the appropriate point.

But... urgh, unfortunately the Cygwin job is now also failing, because the profiling package is not compiling there. With the broken --strict option, that wasn't a problem, because that job was never actually using profiling (I presume)...

@wilfwilson wilfwilson force-pushed the ww/BuildPackages branch 2 times, most recently from 54228dc to 59c8eb6 Compare April 9, 2021 17:27
@ChrisJefferson
Copy link
Contributor

I started fixing this, but the fix grew kept growing, and I'm not convinced it's going to work correctly on all systems (as there can be subtle differences between bash versions). Therefore my suggestion is to just remove this bit of parallel code that builds packages in parallel.

We can leave in the--parallel option and have it still turn on "MAKEFLAGS=-j3", which still helps speed up packages like semigroups (which is where most of the time is spent), the only loss is parallelising running things like configure scripts.

Anyone who really wants to build packages in parallel can do it themselves still with something like this (which we only don't use as you have to install "parallel")

ls | parallel ../bin/BuildPackages.sh

@wilfwilson
Copy link
Member Author

I think that's a reasonable solution.

Would this also be a reasonable solution?: Accept this PR, with an additional check at the start of the script that at most one of --parallel and --strict is given, exiting with an error otherwise (and perhaps giving a message that directs the user to GNU parallel?). On their own, with this PR, the two options work as intended; it's just in combination that there's a problem.

Note that (at least my impression is that) the BuildPackages.sh script is somehow 'unofficial', so I think we're free to do what we want with it.

Somewhat related: is there any reason not to do MAKEFLAGS=-j3 by default?

@wilfwilson wilfwilson force-pushed the ww/BuildPackages branch 2 times, most recently from 4778041 to 8f260c9 Compare April 11, 2021 01:35
@wilfwilson
Copy link
Member Author

By the way, in my opinion we should just use GNU parallel in BuildPackages.sh --parallel if it makes things simpler:

Note that (at least my impression is that) the BuildPackages.sh script is somehow 'unofficial', so I think we're free to do what we want with it.

Sure, the --parallel option won't work on everyone's computer, but then the purpose of BuildPackages.sh isn't to work on everyone's computer, its purpose is to be useful!

@fingolfin
Copy link
Member

I'd be happy to require GNU parallel if the script properly checks for its presence and prints a helpful error message if it isn't there.

.github/workflows/CI.yml Outdated Show resolved Hide resolved
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanations!

So, my understanding is that strict mode is currently broken if --parallel is used. Can we then at least check for both being used at the start of the script, and produce an error that this is currently not supported?

@fingolfin
Copy link
Member

fingolfin commented Apr 14, 2021

So this fails now in the cygwin64 job -- and this is issue existed before, and is revealed by this PR here (so yeah, it does its job). See e.g. https://github.com/gap-system/gap/pull/4373/checks#step:5:1067 for another PR where the same issue occurs, it just isn't reported.

This is the error (also reported as gap-packages/profiling#90):

==== Checking profiling-2.4.1
Running './configure' '/cygdrive/d/a/gap/gap' 
Using settings from /cygdrive/d/a/gap/gap/sysinfo.gap
Running 'gmake' 'clean' 
Makefile.gappkg:41: /cygdrive/d/a/gap/gap/sysinfo.gap: No such file or directory
gmake: *** No rule to make target '/cygdrive/d/a/gap/gap/sysinfo.gap'.  Stop.
Running './configure' '/cygdrive/d/a/gap/gap' 
Using settings from /cygdrive/d/a/gap/gap/sysinfo.gap
Running 'gmake' 
Makefile.gappkg:41: /cygdrive/d/a/gap/gap/sysinfo.gap: No such file or directory
gmake: *** No rule to make target '/cygdrive/d/a/gap/gap/sysinfo.gap'.  Stop.

This may be a hint at the "new(ish)" package build system based on Makefile.gappkg has issues on Windows / cygwin, but I've never seen reports about that before...

Anyway, this also means it would really be useful to have cygwin tests for packages, too, at least for "central" ones like io and profiling.

@wilfwilson
Copy link
Member Author

I've started to investigate Cygwin tests for IO: gap-packages/io#97

gap-system#4384

The addition of the --parallel option to the bin/BuildPackages.sh
script unfortunately broke the operation of the --strict option.

This restores the functionality of --strict when --parallel is not used.

I am not 1337 enough to work out how to do catch the error codes
of parallel background processes!
@wilfwilson
Copy link
Member Author

I have pushed a change which throws an error if --strict and --parallel are used together.

I also fixed the part where the script tests that it is not being run from in GAP's bin directory. This was doubly broken: it was using the function error before it had been defined, and it was doing this by testing for a file gapicon.bmp which (as far as I can tell) is no longer present in GAP, certainly not in bin anyway.

@fingolfin
Copy link
Member

@wilfwilson wow, thanks for catching that.

Regarding gap-packages/io#97, I left a few comments there, but overall io seems to work -- in any case, it builds fine, and so does it here. What does not build is the profiling package. I guess one could start by adapting your IO patch for profiling and then see if it builds there, and if not, use that as starting point for further debugging of the issue.

@wilfwilson
Copy link
Member Author

I was thinking the same on my bike ride this morning.

@fingolfin
Copy link
Member

The profiling build failure is weird. The file /cygdrive/d/a/gap/gap/sysinfo.gap definitely exists; indeed, configure verifies that it exists and is readable. But then gmake says "No such file or directory"....

Hmmm, which gmake is that, I wonder -- perhaps it's not the one from cygwin? That could maybe explain it... Indeed, looking at the cygwin package make, it contains make.exe but not gmake.exe (see https://cygwin.com/cgi-bin2/package-cat.cgi?file=x86_64%2Fmake%2Fmake-4.3-1&grep=make).

One way to deal with that would be to locate the "bad" gmake.exe and simply remove it.

@wilfwilson
Copy link
Member Author

@ChrisJefferson fixed the failing Cygwin job via gap-actions/setup-cygwin#7 – I've re-run the CI and the Cygwin job passed! So when the rest of the CI has passed, I will merge this, and open a feature request issue asking for --strict and --parallel to be supported simultaneously (perhaps with GNU parallel).

@wilfwilson wilfwilson linked an issue Apr 19, 2021 that may be closed by this pull request
@wilfwilson wilfwilson merged commit 27c4e1f into gap-system:master Apr 19, 2021
@wilfwilson wilfwilson deleted the ww/BuildPackages branch April 19, 2021 16:50
@fingolfin fingolfin changed the title Restore some functionality of the --strict option to the bin/BuildPackages.sh script Fix the broken --strict option to BuildPackages.sh, but only if --parallel is not also specified Aug 17, 2022
@fingolfin fingolfin added release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes and removed release notes: to be added PRs introducing changes that should be (but have not yet been) mentioned in the release notes labels Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Issues describing general bugs, and PRs fixing them release notes: use title For PRs: the title of this PR is suitable for direct use in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BuildPackages.sh --strict no longer works
3 participants