Description
Recently I hit the following scenario: libnxz's deflateInit2_
calls zlib's init function sw_deflateInit2
(which maps to deflateInit2_
from libz.so), which internally calls deflateReset
. Since libnxz has been loaded first, this maps to libnxz's deflateReset
, instead of zlib's.
$ cat ./bug.c
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>
int main()
{
Byte src[128], dst[128];
z_stream strm;
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = (voidpf)0;
assert(deflateInit(&strm, Z_DEFAULT_COMPRESSION) == Z_OK);
return 0;
}
$ gcc -Og -g -o bug bug.c -lz
$ LD_PRELOAD=./lib/.libs/libnxz.so gdb ./bug
> break deflateReset
> run
Call stack:
[0] from 0x00007ffff7eee3ac in deflateReset+8 at nx_deflate.c:2312 <-------- called libnxz (not good!)
[1] from 0x00007ffff7e69c58 in deflateInit2_+632 at deflate.c:362 <-------- was on zlib
[2] from 0x00007ffff7f05fd8 in sw_deflateInit2_+48 at sw_zlib.c:102
[3] from 0x00007ffff7eedfd8 in deflateInit2_+276 at nx_deflate.c:2269
[4] from 0x00007ffff7eee384 in deflateInit_+48 at nx_deflate.c:2251
[5] from 0x00000000100102a8 in main+60 at bug.c:14
I tried passing RTLD_DEEPBIND
to dlopen to force the dlopen'd zlib to only call it's own symbols:
diff --git a/lib/sw_zlib.c b/lib/sw_zlib.c
index e97b477..09c0173 100644
--- a/lib/sw_zlib.c
+++ b/lib/sw_zlib.c
@@ -285,7 +285,7 @@ int sw_zlib_init(void)
{
char *error;
- sw_handler = dlopen(ZLIB_PATH, RTLD_LAZY);
+ sw_handler = dlopen(ZLIB_PATH, RTLD_LAZY | RTLD_DEEPBIND);
if(sw_handler == NULL) {
prt_err(" %s\n", dlerror());
return Z_ERRNO;
But the result was the same. There's some loader magic going on that I couldn't figure out yet.
Turns out that when zlib's deflateInit2_
calls deflateReset
through the PLT, the symbol is already resolved to libnxz's deflateReset.
Now if I link the program against libnxz instead, things work as desired:
$ gcc -o bug -g -Og ./bug.c -L./lib/.libs -lnxz
$ LD_PRELOAD=./lib/.libs/libnxz.so gdb ./bug
> break deflateReset
> run
Call stack
[0] from 0x00007ffff7bb7908 in deflateReset+8 at deflate.c:526 <------------ called zlib's deflateReset. ok!!
[1] from 0x00007ffff7bb9c58 in deflateInit2_+632 at deflate.c:362
[2] from 0x00007ffff7f05fd8 in sw_deflateInit2_+48 at sw_zlib.c:102
[3] from 0x00007ffff7eedfd8 in deflateInit2_+276 at nx_deflate.c:2269
[4] from 0x00007ffff7eee384 in deflateInit_+48 at nx_deflate.c:2251
[5] from 0x00000000100102a8 in main+60 at ./bug.c:14
This second example becomes just like the first one if I remove RTLD_DEEPBIND
flag from dlopen
.
I suspect that in the first case RTLD_DEEPBIND
doesn't have effect because libz.so
has already been loaded when we call dlopen
, because it was in the program's dependency list.
$ gcc -Og -g -o bug bug.c -lz
$ LD_PRELOAD=./lib/.libs/libnxz.so LD_DEBUG=libs ./bug
1679000: find library=libz.so.1 [0]; searching
1679000: search cache=/etc/ld.so.cache
1679000: trying file=/lib64/libz.so.1
1679000:
1679000: find library=libc.so.6 [0]; searching
1679000: search cache=/etc/ld.so.cache
1679000: trying file=/lib64/libc.so.6
1679000:
1679000:
1679000: calling init: /lib64/ld64.so.2
1679000:
1679000:
1679000: calling init: /lib64/libc.so.6
1679000:
1679000:
1679000: calling init: /lib64/libz.so.1 <------- libz.so loaded before libnxz.so
1679000:
1679000:
1679000: calling init: ./lib/.libs/libnxz.so
1679000:
1679000: find library=libz.so [0]; searching <------- triggered by dlopen from sw_zlib.c:sw_zlib_init
1679000: search cache=/etc/ld.so.cache
1679000: trying file=/lib64/libz.so
1679000: <------- libz.so not loaded again
1679000:
1679000: initialize program: ./bug
1679000:
1679000:
1679000: transferring control: ./bug
1679000:
1679000:
1679000: calling fini: ./bug [0]
1679000:
1679000:
1679000: calling fini: ./lib/.libs/libnxz.so [0]
1679000:
1679000:
1679000: calling fini: /lib64/libz.so.1 [0]
1679000:
When program is linked against libnxz instead:
$ gcc -o bug -g -Og ./bug.c -L./lib/.libs -lnxz
$ LD_PRELOAD=./lib/.libs/libnxz.so LD_DEBUG=libs ./bug
1681332: find library=libc.so.6 [0]; searching
1681332: search cache=/etc/ld.so.cache
1681332: trying file=/lib64/libc.so.6
1681332:
1681332:
1681332: calling init: /lib64/ld64.so.2
1681332:
1681332:
1681332: calling init: /lib64/libc.so.6
1681332:
1681332: <------- libz.so not loaded because not in app's dependency list
1681332: calling init: ./lib/.libs/libnxz.so
1681332:
1681332: find library=libz.so [0]; searching <------- triggered by dlopen from sw_zlib.c:sw_zlib_init
1681332: search cache=/etc/ld.so.cache
1681332: trying file=/lib64/libz.so
1681332:
1681332:
1681332: calling init: /lib64/libz.so <------- libz.so loaded
1681332:
1681332:
1681332: initialize program: ./bug
1681332:
1681332:
1681332: transferring control: ./bug
1681332:
1681332:
1681332: calling fini: ./bug [0]
1681332:
1681332:
1681332: calling fini: ./lib/.libs/libnxz.so [0]
1681332:
1681332:
1681332: calling fini: /lib64/libz.so [0]
1681332:
The current behavior is not desirable because when we call into zlib with sw_*
we truly want pure zlib functionality. Having zlib jump back to libnxz while executing one of those calls may lead to unexpected behavior, and most likely hard-to-tackle bugs. I actually hit this while improving deflateReset
to always reset both sw and nx state.