-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librav1e produces segfault #485
Comments
Hey, interesting. First step i think would be to see if one can reproduce the problem with some ffmpeg linked with glibc, if so i would guess it's a ffmpeg or librav1e bug somehow, if not we have to figure out what difference musl etc does. Are you able to share PXL_20240630_122849440.mp4, some small cuts of it or some other video that reproduces the problem? btw does |
I have tested multiple input files (hevc, x264, and xvid), all of them produces a crash. Encoding to x264 is OK. So I think the issue is with librav1e
Running with gdb gives:
Do you know if it is possible to compile with debug symbols? (not sure if it can be useful) |
Ok, could you try with alpines own ffmpeg which also has librav1e?
And also some glibc-based distro like debian?
About debug symbols: yes is possible, remove |
The Alpine and Debian containers work fine. I tried to recompile with the debug flags, but I don't get anymore information |
Ok than. Then i would try without librsvg, is also rust, there was some issue with dup symbols I didn't not manage to reproduce locally with some files. Are you able to share some file that triggers this?, would make it a lot easier to help. Weird about debug symbols, must be something more then 🤔 |
Maybe also try with just librav1e. Could also compare how alpine does things https://git.alpinelinux.org/aports/tree/community/rav1e/APKBUILD?h=3.19-stable |
Thanks for the help! I'll do some more testing tomorrow Also I found an old post of yours about a similar issue lu-zero/cargo-c#98 |
No problem!
👍 tip is to try minimize the dockerfile as much as possible first and then start digg more into details. that way it will be less unrelated moving parts and much faster to iterate and try things. but again if you have a test file i can use it would be great.
I think that was about rust itself crashing as build time? |
You can download the example file I used here: https://photos.app.goo.gl/WVZ7D6giYhYFmbs36 However I face the issue with multiple files, so I suppose it makes no difference. Also, the crash happens right at the beginning, so it's pretty easy to reproduce.
My bad, I did a quick search on "cargo cbuild" and "segfault". I didn't notice at first that you were the author of the issue, that's funny! |
Thanks. Weirdly it seems to work fine for me on a macbook m3 (arm64). What CPU are you using? could it be that librav1e or ffmpeg ends up using some instruction that is not available (feature detect at build time on build host etc)? but then it usually crashes with SIGILL hmm $ docker run -it --rm -v "$PWD:$PWD" -w "$PWD" docker.io/mwader/static-ffmpeg:latest -v debug -i PXL_20240630_122849440.mp4 -c:v librav1e 'PXL_20240630_122849440.av1.mp4'; echo $?
...
0
$ ffprobe -hide_banner -i PXL_20240630_122849440.av1.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'PXL_20240630_122849440.av1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomav01iso2mp41
encoder : Lavf61.1.100
Duration: 00:00:09.78, start: 0.000000, bitrate: 11581 kb/s
Stream #0:0[0x1](und): Video: av1 (libdav1d) (Main) (av01 / 0x31307661), yuv420p(tv, smpte170m/bt470bg/bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 11495 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
encoder : Lavc61.3.100 librav1e
Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
😄 |
A little more details on my setup: CPU is x86_64, I'm running Debian 12 and Oracle Linux 9 (Redhat 9) under Virtualbox |
Hmm interesting, could it be that the VM lacks support for SSE instructions etc? but looking at the code https://github.com/xiph/rav1e/blob/e34e772e47b01169b6f75a4589c056624ea886a4/src/cpu_features/x86.rs#L20 it seems like i do runtime detection hmm. Maybe you can check the VM settings? if it does not help i think we need to get a proper debug build and inspect things with gdb. |
Hey, did you get anywhere with this? |
Sorry for the delay I managed to reproduce the issue on bare metal. The CPU is AMD Ryzen 7 5800X I would be great is someone else could reproduce it on different hardware |
Ok! that is strange. If you have time it would be great to try to minimize down the Dockerfile. Maybe something like: remove everything except building rav1e and ffmpeg, very it stills crashes, after that maybe try change the build to be more like alpine https://git.alpinelinux.org/aports/tree/community/rav1e/APKBUILD?h=3.19-stable#n34 ? ... i see that they do use some newer cargoc stuff. Not sure if |
btw it might worth looking thru rav1e issues and see if something liks similar? things like: |
one suggestion in the issues is to try with |
I did some testing today rav1e works fine if I only enable x264 and rav1e. I kept all the compilation flags as it is. I'm still not sure why with the full Dockerfile, it crashes |
That is very interesting! could you try re-add librsvg and see if it start to crash again? that is my main suspect that statically linking two rust based libraries causes some symbol conflict/mixing that is bad... but if so why it would only affect a certain type of cpu is a bit of a mystery, but it've seen werider things :) |
I tried to add a rav1e sanity test and the CI job segfauled in a similar way #490 🤔 |
I stripped the Dockerfile of everything except glib, harfbuzz, cairo, pango, librsvg, fdk_aac, x264, and rav1e, this reproduces the issue! |
Hey, yeap! i also managed to reproduce it myself now on my old intel macbook and it only seem to happen when linking with both rav1e and librsvg. The stacktrace suggests it crashes inside the rust rayon crate, somewhere here https://github.com/rayon-rs/rayon/blob/main/rayon-core/src/registry.rs#L329-L338 ...my guess is that crash is somehow related to some issue with two rust runtimes being linked together (librsvg and rav1e both uses rayon but different version so i think it should be fine, but not sure). But it's still weird why it works on arm64, maybe for some reason symbols resolve differently and it happens to work? Some progress at least! will do more digging tomorrow or so |
Update: i tried to recreate the two staticlib rust crates that uses rayon with same dependencies and a c program that static-pie links them but no crash on both arm64 and amd64. I'll keep digging from time to time. BTW for your use case would using libsvtav1 be an option? |
In the end, I used a different image with a non-static ffmpeg that works for me. I tried to recompile rav1e with rayon 1.0, it works, but ffmpeg still crashes. Could it be relater to symbol mangling? I suppose different versions should have different names for proper linking? |
👍
The rav1e cli tools works but not ffmpeg?
Yeap i'm not sure what is going on but i suspect there is some issue with miss matching rust runtime symbols etc, e.g. that the runtime is compiled a little bit differently between libs and then gets mixed up. But a bit of a mystery why arm64 seems to work but not amd64... maybe just by chance |
Yes rav1e works. The workflow is a bit different, because input files have to be in y4m format, but it works fine. |
Trying to encode a file into av1 with librav1e results in segfault
There is no output, but dmesg shows:
Copying the ffmpeg bin from container to host results in the same error.
The text was updated successfully, but these errors were encountered: