Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double free or corruption issue when make runtest or train mnist model #5282

Closed
zagwin opened this issue Feb 14, 2017 · 11 comments
Closed

Double free or corruption issue when make runtest or train mnist model #5282

zagwin opened this issue Feb 14, 2017 · 11 comments

Comments

@zagwin
Copy link

zagwin commented Feb 14, 2017

Issue summary

I can successfully build caffe, with make all, make pycaffe, make test without error.
When I make runtest, it stops immediately; When I train mnist model, it stops ealierly, and gives the same errors.

I didn't change anything, just clone, and make. I have struggled with this issue for a long time, anybody can help me find out what's it wrong? thanks

*** Error in `.build_debug/tools/caffe': double free or corruption (out): 0x0000000002119160 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f01f4ea87e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f01f4eb0e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f01f4eb498c]
/usr/lib/x86_64-linux-gnu/libprotobuf.so.9(_ZN6google8protobuf8internal28DestroyDefaultRepeatedFieldsEv+0x1f)[0x7f01f61be8af]
/usr/lib/x86_64-linux-gnu/libprotobuf.so.9(_ZN6google8protobuf23ShutdownProtobufLibraryEv+0x8b)[0x7f01f61bdb3b]
/usr/lib/x86_64-linux-gnu/libmirprotobuf.so.3(+0x20329)[0x7f01d04fd329]
/lib64/ld-linux-x86-64.so.2(+0x10c17)[0x7f01f85e8c17]
/lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x7f01f4e6aff8]
/lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x7f01f4e6b045]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x7f01f4e51837]
.build_debug/tools/caffe[0x426dd9]

I attached my Makefile.config at
Makefile.config.pdf

I also attached the full debug output for record.
debug output.pdf

Your system configuration

Operating system: Ubuntu 16.04 Desktop
Compiler: gcc
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1
BLAS: atlas
Python or MATLAB version (for pycaffe and matcaffe respectively): anaconda python 2.7

Best,
Weldon

@shelhamer
Copy link
Member

Sorry, this seems to be a system issue. Please ask installation questions on the mailing list.

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

Please do not post usage, installation, or modeling questions, or other requests for help to Issues.
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

@cailile
Copy link

cailile commented Apr 24, 2017

Hi, I encounter this problem recently when running the caffe/ssd branch. The cause turned out to be that caffe has simultaneously linked to libprotobuf.so and libprotobuf-lite.so, which double free allocated memory. You may check whether you have this double-link problem by checking the libraries that the built caffe has linked to by typing:

ldd caffe | grep proto

In my case, the caffe has simultaneously linked to libprotobuf.so.10, libprotobuf-lite.so.10 and libmirprotobuf.so.3, and the latter two were originally linked to opencv_highgui. By removing the opencv's highgui library from caffe's makefile and the involved functions in the source files, the problem was gone.

Hope this helps and good luck!

@jmuncaster
Copy link

@cailile thank you for your comment, I encountered this problem recently and you helped me to fix it. The GTK build of opencv_highgui was responsible for bringing in libprotobuf-lite.so. The fix that I did, which does not require changing the source code, was to rebuild OpenCV against Qt5 instead of GTK, and rebuild caffe. On Ubuntu 16.04 the qt5 package is "qt5-default" and the OpenCV cmake option is WITH_QT.

@jontitalukdar
Copy link

@cailile I have encountered the exact same problem during installing caffe/ssd branch as mentioned here. However, the solution you directed is a bit unclear and it would really help if you could elaborate more on how you solved it. Thanks a lot.

@cailile
Copy link

cailile commented Jun 15, 2017 via email

@cailile
Copy link

cailile commented Jun 15, 2017

@jontitalukdar Here are some more comments. The solution I currently adopt is to roll back to Ubuntu 14.04, because simply excluding opencv_highgui when building caffe will only solve the problem on the caffe side. Later on when I want to import both caffe and cv2 in Python, the problem came up again. I am not sure whether there is a solution for libprotobuf and libprotobuf-lite to run together. @jmuncaster's solution is worth a try. If he post it earlier, I may not have to roll back to Ubuntu 14.04:)

@jontitalukdar
Copy link

@cailile Thank you so much for your reply. You are absolutely correct, the opencv_highgui will cause problems when importing both caffe and cv2 withing the same script. Moreover, I installed opencv in a python virtual environment, which caused some further errors. Removing any one of the two, libprotobuf and libprotobuf-lite, might cause further unforeseen problems in the future.

So I tried rebuilding OpenCV using Qt5 instead of GTK as proposed by @jmuncaster , and it worked!
I cleaned the original OpenCV build and then reinstalled it with Qt5.

make clean
mkdir build
cd build/
cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=/usr/local -DFORCE_VTK=ON -DWITH_TBB=ON -DWITH_V4L=ON -DWITH_QT=ON -DWITH_OPENGL=ON -DWITH_CUBLAS=ON -DCUDA_NVCC_FLAGS="-D_FORCE_INLINES" -DWITH_GDAL=ON -DWITH_XINE=ON -DBUILD_EXAMPLES=ON ..

I also added the library path of OpenCV in the Caffe Makefile.config and then reibuilt ssd/caffe using make.

LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial /usr/local/share/OpenCV/3rdparty/lib/

It seems to have worked for me for now. I will keep a close watch if any other discrepancies crop up, but it Works for now!
Thank you so much for your help @cailile :)

@novate
Copy link

novate commented Dec 22, 2017

@caille Thank you so much for your solution. Now the problem of double free or corruption has gone. The side effect is that when we make caffe without highgui, we can't utilize things like webcam or output detections as video.
@jontitalukdar Here is something I suggest: when making openCV, I strongly suggest add -D WITH_GTK=NO, without this my computer will automatically build with gtk if it can find gtk packs on computer which I don’t know why.
What’s more, I can’t install qt5-default(don’t know why, but can’t apt-get, lots of unmets), but I use qt4 instead for compiling openCV, and it works.

@wishinger-li
Copy link

@cailile thanks for your suggestion,It worked on my computer,but,I have another problem.
The same code I used three months ago,it run smoothly.When I use it tomorrow,it run with error.
so what happens during this period?

@panecho
Copy link

panecho commented Aug 9, 2018

I solved it according to #5777.

@laker-sprus
Copy link

Nice. Also work for the "./upgrade_net_proto_binary" abort problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants