Skip to content

Conversation

@ivannp
Copy link

@ivannp ivannp commented Dec 10, 2018

No description provided.

@ivannp
Copy link
Author

ivannp commented Dec 10, 2018

It seems I have finally found the root cause of the crashes on exit on Linux. The main reason was the calls from libEnd to free the memory object. This is wrong, because the documentation suggests that the library destructor is called after the destructors of global objects. That's exactly what I was seeing in the core dump.

The pull request also tries to fix a bug in the Init.cpp where a -1 index access was possible. Please verify that the fix is correct - i.e. builds the correct tables.

To find these bugs, I used various sanitization flags, which I left commented in the makefiles for future reference.

Last, I added an API which uses JSON to return the DD tables. Using JSON based APIs is incredibly useful when calling the library from other languages. I realize this could be left out as a separate fix (and it does bring dependency on an external header-only library), so two questions here:

  • Do you want me to remove it from this PR?
  • If I should remove it, would you be interested in accepting it in a future PR?

To me, as a user of the library from Python, it's invaluable.

@ivannp ivannp closed this Dec 10, 2018
@ivannp ivannp reopened this Dec 10, 2018
@ivannp ivannp changed the title Master Fixes for core dump problems on Linux. Added new API. Dec 10, 2018
@df7cb
Copy link
Contributor

df7cb commented Jan 13, 2019

In Debian, we have the bug that examples/DealerPar crashes on 32bit i386, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910939. I suspect a combination of fixes from this and other PRs might fix it.
Could someone please look into merging this into master, and make a new release? Otherwise dds won't be part of the next Debian release.
Thanks!

@pierretallotte
Copy link

@ivannp
I still have the issue with your commits. I created a Python script as I explained here and I run it with the shared library I create with the code in your master branch (I just removed optimization and lto options to add -ggdb -g3) and here is what I got:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, Memory::GetPtr (this=0x7ffff67c0520 <memory>, thrId=0) at Memory.cpp:113
113	    cout << "Memory::GetPtr: " << thrId << " vs. " << nThreads << " vs. " << memory.size() << endl;
(gdb) bt
#0  Memory::GetPtr (this=0x7ffff67c0520 <memory>, thrId=0) at Memory.cpp:113
#1  0x00007ffff654e2ce in CloseDebugFiles () at Init.cpp:398
#2  0x00007ffff6557985 in Memory::~Memory (this=0x7ffff67c0520 <memory>, __in_chrg=<optimized out>) at Memory.cpp:23
#3  0x00007ffff7829ff8 in __run_exit_handlers (status=0, listp=0x7ffff7bb45f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#4  0x00007ffff782a045 in __GI_exit (status=<optimized out>) at exit.c:104
#5  0x00007ffff7810837 in __libc_start_main (main=0x555555636f40 <main>, argc=2, argv=0x7fffffffddf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdde8)
    at ../csu/libc-start.c:325
#6  0x0000555555717e0e in _start () at ../sysdeps/x86_64/elf/start.S:103
(gdb) n
Memory::GetPtr: 0 vs. 0 vs. 0
114	    exit(1);

Indeed, even when we call CloseDebugFiles in the destructor of Memory, we still have the error.

I don't know the code but isn't it a better idea to return NULL in Memory::GetPtr if thrId >= nThreads and to handle this case in CloseDebugFiles?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants