Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load kernel extensions built against non-install GAP into an executable built against libgap #5232

Closed
dimpase opened this issue Dec 1, 2022 · 22 comments · Fixed by #5235

Comments

@dimpase
Copy link
Member

dimpase commented Dec 1, 2022

libgap is broken in the current master and in 4.12.1 (but still OK on 4.11.1) on macOS 13.0.1 if io is built, as can be seen by running
tst/testlibgap/basic: (which is built by make testlibgap, for instance)

% ./tst/testlibgap/basic -l .
# Initializing GAP...
 ┌───────┐   GAP 4.13dev-169-g28d5faa built on 2022-11-30 18:44:58+0000
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-apple-darwin22-default64-kv8
 Configuration:  gmp 6.2.1, GASMAN, readline
 Loading the library and packages ...
Error, LOAD_DYN: failed to load kernel module ./pkg/io//bin/x86_64-apple-darwin2\
2-default64-kv8/io.so, dlopen(./pkg/io//bin/x86_64-apple-darwin22-default\
64-kv8/io.so, 0x0005): Symbol not found: _AssGVar
  Referenced from: <691ED98B-03DB-3A1C-98A6-D5307CB4FCF6> /Users/dima/sof\
tware/gap/pkg/io/bin/x86_64-apple-darwin22-default64-kv8/io.so
  Expected in:     <6EA6D1DA-9BAB-322E-AEE4-887EFCA0E1EF> /Users/dima/sof\
tware/gap/tst/testlibgap/.libs/basic in
  LOAD_DYN( filename ) at ./lib/files.gd:595 called from 
<function "LoadDynamicModule">( <arguments> )
 called from read-eval loop at ./pkg/io//init.g:23
Error, was not in any namespace at ./lib/variable.g:269 called from
LEAVE_NAMESPACE(  ); at ./lib/package.gi:1327 called from
ReadPackage( pkgname, "init.g" ); at ./lib/package.gi:1688 called from
LoadPackage( pair[1], pair[2], false ) at ./lib/package.gi:1933 called from
func(  ); at ./lib/system.g:228 called from
<function "CallAndInstallPostRestore">( <arguments> )
 called from read-eval loop at ./lib/init.g:700
you can 'quit;' to quit to outer loop, or
you can 'return;' to continue
brk> 

On linux this works fine:

$ ./tst/testlibgap/basic -l .
# Initializing GAP...
 ┌───────┐   GAP 4.13dev-169-g28d5faa built on 2022-11-30 21:48:54+0000
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-pc-linux-gnu-default64-kv8
 Configuration:  gmp 6.2.1, GASMAN, readline
 Loading the library and packages ...
 Packages:   AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.6, AutPGrp 1.11, CRISP 1.4.5, Cryst 4.1.25, CrystCat 1.1.10, CTblLib 1.3.4, FactInt 1.6.3, FGA 1.4.0, Forms 1.2.9, GAPDoc 1.6.6, genss 1.6.8, IO 4.8.0, 
             IRREDSOL 1.4.4, LAGUNA 3.9.5, orb 4.9.0, Polenta 1.3.10, Polycyclic 2.16, PrimGrp 3.4.2, RadiRoot 2.9, recog 1.4.2, ResClasses 4.7.3, SmallGrp 1.5.1, Sophus 1.27, SpinSym 1.5.2, TomLib 1.2.9, 
             TransGrp 3.6.3, utils 0.78
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'
gap> 

Originally posted by @dimpase in #5230 (comment)

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

also, it can be triggered non-interactively. With the following change

--- a/tst/testlibgap/basic.c
+++ b/tst/testlibgap/basic.c
@@ -6,6 +6,7 @@ int main(int argc, char ** argv)
 {
     printf("# Initializing GAP...\n");
     GAP_Initialize(argc, argv, 0, 0, 1);
+    test_eval("LoadPackage(\"io\");");
     test_eval("1+2+3;");
     test_eval("g:=FreeGroup(2);");
     test_eval("a:=g.1;");

by running make testlibgap on macOS 13 one gets the same error as above.

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

There is no issue with GAP 4.11.1.

Unless someone can tell straight away where the regression was introduced, I'll do a git bisect to find out later today.

@ChrisJefferson
Copy link
Contributor

I can't think of anything, and at first glance AssGVar doesn't look like a function we treat unusually.

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

bisecting now, at most 10 runs to go :-)

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

according to git bisect, the regression came from @fingolfin 's commit here;

9c343a4d91fb33900f0ac24010a99e7e7c800290 is the first bad commit
commit 9c343a4d91fb33900f0ac24010a99e7e7c800290
Author: Max Horn <max@quendi.de>
Date:   Fri Mar 4 12:22:45 2022 +0100

    buildsys: change gac to not use libtool

 Makefile.rules | 47 +++++++++++++++++++++++++++++++++------
 cnf/gac.in     | 69 +++++++++++++++++++++-------------------------------------
 configure.ac   |  1 -
 3 files changed, 65 insertions(+), 52 deletions(-)

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

reverting 9c343a4 on the master, i.e. 28d5faa, fixes the issue.

I haven't tried running a full testsuite.

dimpase added a commit to dimpase/gap that referenced this issue Dec 1, 2022
as a test, we load IO package - assuming it was built
so that its kernel module is created.

That was missing, and caused a regression reported in
gap-system#5232
@fingolfin
Copy link
Member

Fixed in #5235.

@fingolfin fingolfin changed the title libgap broken on macOS 13.0.1 if io is built (split out of #5230) Cannot load kernel extensions built against non-install GAP into an executable built against libgap Dec 1, 2022
@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

probably not related, but testing without the bad commit produces two errors in streams.tst

gap> Read( Filename( DirectoriesLibrary( "tst" ), "testinstall.g" ) );
Architecture: x86_64-apple-darwin22-default64-kv8
...
testing: /Users/dima/software/gap/tst/testinstall/streams.tst
########> Diff in /Users/dima/software/gap/tst/testinstall/streams.tst:
214
# Input is:
for i in [ 1 .. 300 ] do
   Add( streams, OutputTextFile( fname, false ) );
od;;
# Expected output:
Error, Too many open files (internal file descriptor limit reached)
# But found:
########
########> Diff in /Users/dima/software/gap/tst/testinstall/streams.tst:
218
# Input is:
Perform( streams, CloseStream );
# Expected output:
# But found:
Error, no method found! For debugging hints type ?Recovery from NoMe\
thodFound
Error, no 1st choice method found for `CloseStream' on 1 arguments
The 1st argument is 'fail' which might point to an earlier problem

########
     120 ms (115 ms GC) and 509KB allocated for streams.tst

@fingolfin
Copy link
Member

After looking this in details, I disagree that there is a bug here: you built a kernel extension against a non-installed gap which is not using libgap. You then tried to load it into a different executable (tst/testlibgap/basic) which is using libgap.

In the long run, I don't expect this to work at all. In the short run, I was able to find a workaround, which is in PR #5235.

But the initial assessment, that "libgap is broken in the current master" is simply not accurate. It works fine, also with kernel extensions, as long as you built the kernel extension against libgap (i.e., make install gap and build against that). We verify that this works as part of our CI, too.

If there is genuine need to load packages in tst/testlibgap/basic in the future, then we could make those tests more sophisticated, by giving them their own sysinfo.gap with suitable flags to ensure a GAP package built with it is built against the non-installed libgap, and the building the relevant packages with that. But that is a lot of effort for little noticeable gain, as it tests things in an "unnatural" scenario.

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

once again, how does gap use libgap? The only way I know about how an executable can "use" a dynamical library is to link it. But it does not. Here is the linkage of gap installed with make install to --prefix=/tmp

% otool -L gap
gap:
	/tmp/lib/libgap.8.dylib (compatibility version 9.0.0, current version 9.0.0)
	/usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 15.0.0, current version 15.1.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 8.2.0, current version 8.2.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

there is no libgap there to talk about.

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

oh, wait, it does?!

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

I have never seen a package where the linkage of an installed executable changes this way after installation...

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

I still would like to see a test showing that libgap can load io.so in such a scenario - after all, gap executable has no role in such a scenario, so its linkage does not matter. Do you say you test for this in CI? Where?

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

If there is genuine need to load packages in tst/testlibgap/basic in the future

well, tst/testlibgap/basic is akin to our python3 which loads libgap extension libgap.cpython-311-darwin.so

% otool -L src/sage/libs/gap/libgap.cpython-311-darwin.so 
src/sage/libs/gap/libgap.cpython-311-darwin.so:
	/Users/dima/software/sage/local/lib/libgap.8.dylib (compatibility version 9.0.0, current version 9.0.0)
	/usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 15.0.0, current version 15.1.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

Are you saying this might break in the future because of... reasons? But that's precisely what libgap was designed for.
No need for gap executable around.

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

and as far as io.so, are you suggesting it must link to libgap at build time? But why, it's libgap that loads it, not the other way around. That's how it looks for us:

% find local -name io.so # so there is only one `io.so` around.
local/lib/gap/pkg/io/bin/x86_64-apple-darwin22-default64-kv8/io.so
%
% otool -L `find local -name io.so`
local/lib/gap/pkg/io/bin/x86_64-apple-darwin22-default64-kv8/io.so:
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

@dimpase
Copy link
Member Author

dimpase commented Dec 1, 2022

once again, how does gap use libgap? The only way I know about how an executable can "use" a dynamical library is to link it. But it does not. Here is the linkage of gap installed with make install to --prefix=/tmp

% otool -L gap
gap:
	/tmp/lib/libgap.8.dylib (compatibility version 9.0.0, current version 9.0.0)
	/usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 15.0.0, current version 15.1.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 8.2.0, current version 8.2.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

there is no libgap there to talk about.

Oops, I was on a wrong branch, on master. There indeed gap is linked to libgap. But this is not the case for v4.12.1
branch. There the installed to ${prefix}/bin gap is just a script executing ${prefix}/lib/gap/gapwith appropriate options, and the latter does not link tolibgap`, it is

% otool -L /tmp/tt/lib/gap/gap 
/tmp/tt/lib/gap/gap:
	/usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 15.0.0, current version 15.1.0)
	/usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 8.2.0, current version 8.2.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

With this, it's really not clear how using installed gap can "bless" io.so built using it.
So #5235 should be backported to v4.12.1.
On master, we'll need to try, but that's another story.

@fingolfin
Copy link
Member

To clarify, what I say about libgap linking was done in #5193 and will appear in GAp 4.12.2, precisely to fix the problem you are seeing. The patch in #5235 is a limited workaround. We can also backport it to 4.12.2, but the better fix is #5193 (plus #5206 as a fixup).

@dimpase
Copy link
Member Author

dimpase commented Dec 2, 2022

I'd like to point out that it's normal for extensions to an interpreter to be interoperable with being used directly on the interpreter, as well as in a library that wraps it, to allow interpreter embedding. E.g. Python (cpython, to be precise) can be embedded into a program using libpython dylib. And of course all the extensions available to your instance of python are loadable in it embedded via its libpython. And, you know, python is never linked to libpython...

Are you sure you're not inventing a square wheel in #5193 ?

@dimpase
Copy link
Member Author

dimpase commented Dec 2, 2022

To clarify, what I say about libgap linking was done in #5193 and will appear in GAp 4.12.2, precisely to fix the problem you are seeing. The patch in #5235 is a limited workaround. We can also backport it to 4.12.2, but the better fix is #5193 (plus #5206 as a fixup).

I'd like to create a release GAP tarball from the master branch, to see that actually it does fix our Sagemath issue on macOS without #5235. How does one do it locally (I can only see instructions for releasing GAP using GH Actions, not what I want) ?

@fingolfin
Copy link
Member

I'd like to point out that it's normal for extensions to an interpreter to be interoperable with being used directly on the interpreter, as well as in a library that wraps it, to allow interpreter embedding. E.g. Python (cpython, to be precise) can be embedded into a program using libpython dylib. And of course all the extensions available to your instance of python are loadable in it embedded via its libpython. And, you know, python is never linked to libpython...

Actually, on macOS, it is (well, against Python.framework, which is the macOS equivalent); note how macOS is the same platform on which you had a problem with gap vs. libgap... Makes you wonder, doesn't it?

$ otool -L /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/bin/python3.9
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/bin/python3.9 (architecture x86_64):
	@executable_path/../Python3 (compatibility version 3.9.0, current version 3.9.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/bin/python3.9 (architecture arm64):
	@executable_path/../Python3 (compatibility version 3.9.0, current version 3.9.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

$ otool -L /opt/homebrew/opt/python@3.10/bin/python3
/opt/homebrew/opt/python@3.10/bin/python3:
	/opt/homebrew/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/Python (compatibility version 3.10.0, current version 3.10.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.100.3)

# hey for fun, here is python bundled with an older sage version:
$ otool -L /Users/mhorn/Projekte/foreign/sage/local/bin/python3.6
/Users/mhorn/Projekte/foreign/sage/local/bin/python3.6:
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1259.22.0)
	/Users/mhorn/Projekte/foreign/sage/local/lib/libpython3.6m.dylib (compatibility version 3.6.0, current version 3.6.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

See, you make a claim about what is "normal" and then give one data point, which is then only half-accurate cough. But even if you gave ten examples: how is that a valid technical argument? Lots of bad things are done by "almost every project doing X" for any given X. Don't you think it would be better be to ask why we (and Python) do certain things in a certain way? Also notice how python has a few orders of magnitude more person power and resources. They also have generally different constraints and priorities. Things that work great for Python may simply be infeasible for us. So yeah, they do things differently from us, but in first approximation, that means nothing.

So overall I find this point irrelevant and somewhat condescending ("sage devs telling stupid gap devs how the world works")

Are you sure you're not inventing a square wheel in #5193 ?

Yes, I am, thank you for asking.

And I'd be surprised if there wasn't software on your compute using the same technique, because GNU libtool uses it extensively...

To clarify, what I say about libgap linking was done in #5193 and will appear in GAp 4.12.2, precisely to fix the problem you are seeing. The patch in #5235 is a limited workaround. We can also backport it to 4.12.2, but the better fix is #5193 (plus #5206 as a fixup).

I'd like to create a release GAP tarball from the master branch, to see that actually it does fix our Sagemath issue on macOS without #5235. How does one do it locally (I can only see instructions for releasing GAP using GH Actions, not what I want) ?

See dev/releases/README.md, look for make_archives.py.

@dimpase
Copy link
Member Author

dimpase commented Dec 3, 2022

See, you make a claim about what is "normal"

I said that interoperability is normal. The last sentence of the quoted by you paragraph is also correct, contrary to what you wrote, because
on macOS libpython I am talking about is still there, named libpython<python-version>.dylib, and
python3 is not linked to it. Python.framework is just a confusing piece of pathname here, no?

% otool -L /Library/Frameworks/Python.framework/Versions/3.11/lib/libpython3.11.dylib 
/Library/Frameworks/Python.framework/Versions/3.11/lib/libpython3.11.dylib:
        /Library/Frameworks/Python.framework/Versions/3.11/Python (compatibility version 3.11.0, current version 3.11.0)
        /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1856.105.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)

This libpython3.11.dylib corresponds to the default Python3 on this box, and the latter is not linked to to it.

% otool -L `which python3`
/Library/Frameworks/Python.framework/Versions/3.11/bin/python3:
        /Library/Frameworks/Python.framework/Versions/3.11/Python (compatibility version 3.11.0, current version 3.11.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)

@dimpase
Copy link
Member Author

dimpase commented Dec 3, 2022

It's, however, fair to say that /Library/Frameworks/Python.framework/Versions/3.11/lib/libpython3.11.dylib is just a symbolic link to /Library/Frameworks/Python.framework/Versions/3.11/Python, so it that sense I was wrong in saying that python3 is not linked to libpython3.11.dylib. Apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants