-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent NaNs appearing in results from calling shared-library blis from Julia #12
Comments
Tyler took a look at your output and had an idea: is Julia calling BLIS via multiple threads? (The reference configuration is not thread-safe.) |
Aha. Well not exactly multiple threads, but the master julia process launches separate worker julia processes to run the tests in parallel. The subprocesses should have separate memory spaces in which they load dynamic libraries, I think. Even when I run the tests in a single julia process, I still get similar-looking non-reproducible results and intermittent NaNs though. Is there a list of which configurations are or are not thread-safe anywhere? Or should I assume none of them are at this point? |
Sorry, I should have been more descriptive. The reference configuration isn't thread-safe because it is single-threaded and does not enable OpenMP. (OpenMP is currently the only threading model we support, but POSIX threads could be supported in the future.) In order to achieve thread safety, a critical section in the BLIS memory allocator must be enabled. Currently, it's only enabled when OpenMP is enabled, via BLIS_ENABLE_OPENMP. Now, I'm not sure if the critical section works when BLIS is called from two different application threads. Tyler thinks that it may not do what we want it to do in that situation. This all may be unrelated, of course. But it would be interesting to enable OpenMP and see if that fixes some or all of the NaNs. |
Sure, happy to try. Where would that flag be located? |
Just add
to bli_config.h. |
OK, I also had to uncomment Trying with 64-bit integers in BLIS resulted in a segfault, I'll see if I can get a backtrace. |
This kind of error might also result from reading uninitialized memory. On Wednesday, July 2, 2014, Tony Kelman notifications@github.com wrote:
|
Regarding the 64-bit integer segfaults, something seemed to be calling lapack |
I think you'll need to run it as On Fri, Jul 4, 2014 at 2:59 PM, Tony Kelman notifications@github.com
|
Thanks yeah I realized that a few minutes after initially posting. So there's a lot going on here https://gist.github.com/tkelman/c55da37d849d01585739 that I don't quite know how to make sense of, though the calls to lapack @ViralBShah now might not be a good time, but in case you want to play with an alternate BLAS at all... |
I think I've figured out what's happening. These first few tests in linalg are often going to lapack, which complicates matters. It was more useful to start at the tests that use blas directly. Julia offers both in-place and allocating versions of blas calls. The allocating versions do not explicitly initialize the contents of the output array before calling blas, they just send in whatever garbage is in the memory and use beta = 0. BLIS appears to be following the mathematical statement of BLAS operations a bit more literally than the reference and other implementations I'm aware of, and is actually multiplying my input garbage by zero. When my input garbage has NaNs (or presumably +/- Inf as well, but that's not as likely to happen randomly), the output ends up with corresponding NaNs in the same locations. Should BLIS be checking for beta == 0 and special-casing to an assign-only operation, instead of multiplying the input by 0? Many applications may have grown accustomed to relying on this behavior over the years without realizing it. |
@tkelman I want to definitely try this out, but only after the 0.3 release. Would be great to have a BLIS.jl julia package for now. You could just replace all calls to |
@ViralBShah that's about what I thought you'd say right now, no worries. I'll see what I can do with Clang auto-wrapping and coming up with a package. With this patch to Julia https://gist.github.com/a16ce380d174b67561cd to initialize the allocating BLAS calls to 0, I see BLAS tests passing reliably for Float32 and Float64. There appears to be a problem with the imaginary part of the result of The linalg tests still see a few nans, lapack might be making some similar beta == 0 initialization assumptions to what unpatched Julia was making. |
Generally speaking, BLIS should already be employing this write vs. update optimization for beta == 0 cases. Could you give me more details about how your are running BLIS? Are you configuring with the reference configuration? Which operation is being called that causes the NaNs to pop up? Looking at the reference gemm micro-kernel, I can't see any reason it would allow uninitialized memory to get in the way of a beta = 0 invocation. Maybe it's a different operation altogether (e.g. level-1 / level-2) that is causing the problem? |
Yes, only change here is
Not sure whether gemm was ever causing trouble, I'll see how many test failures I can cause from direct BLAS calls. syrk definitely was leaving input NaNs in place with beta = 0. |
I can cause a problem with
Compiled and run as:
|
Thanks for isolating the problem, Tony. I have been able to reproduce the bug. It turns out there are two places where the "overwrite or update" logic must exist. The first is in the micro-kernel itself, which is the first place that comes to mind when I think of this issue. The second is in the macro-kernel, immediately after the micro-kernel is called for edge cases. There, the micro-kernel computes into a temporary buffer, and then the elements corresponding to the edge case elements that actually exist are copied to the output matrix. This second location only has "update" logic, hence the bug. However, I can't figure out why it is not also manifesting for double-precision. I'll add this fix to the top of my queue. |
@tkelman Just a quick update: I figured out why the NaNs were only being produced by single-precision real, and not double-precision real gemm. It has to do with default micro-kernel sizes. The default sgemm micro-kernel size is 8x4, whereas for dgemm it is 4x4. The latter perfectly matches your 4x4 test case, and thus there is no edge case handling needed there (unlike for sgemm). It is in this edge case handling where the NaNs sneak through. (If you were to change the test matrix size in your test driver, for the double-precision case, they would likely show up.) |
Yep, good catch, thanks for the heads up. 3x3 double lets nans through too. For 9x9 single or double, just the last row and last column are nans. |
@tkelman Should be fixed now. Honestly, I don't know how this bug went undetected for so long, aside from dumb luck that, up until now, none of the infs or NaNs were inside the test matrices' edge cases. |
Great, thanks Field. This is now passing Julia's blas tests for single and double. I'll see if I can reproduce any issues with complex in a standalone example and open a new issue if so. |
Okay I'm not sure what's at fault with the complex dot product, I don't think it's worth opening a new issue on. The problem only occurs with I can work around the issue by applying this patch to Julia http://gist.github.com/43cece304e7247307e47 to account for the extra inputs to |
@tkelman Hmm, If the problem were about return value conventions, you would think it would affect I did a quick test on my end and was not able to reproduce any issue using 4x1 vectors. Would it be easy to check exactly how wrong the imaginary result is? (Perhaps this is subtle numerical issue, which of course single-precision floats are quite vulnerable to.) |
It's far enough off that this isn't a matter of numerical differences.
So |
cc @andreasnoackjensen does this stuff with complex return types from |
@tkelman If netlib BLAS exhibits the problem too, then it sounds like it's on Julia's side. I'll let you take it from here! :) |
The |
One possibility is to just use the CBLAS wrapper instead of calling it directly. |
FWIW, I ran into this issue a lot back when Kazushige Goto maintained GotoBLAS. It basically boiled down to making sure that the Fortran components of both the BLAS and the application were compiled with either -ff2c or -fno-f2c. (The important thing was that they matched, though I preferred the more sane -fno-f2c option.) https://gcc.gnu.org/onlinedocs/gfortran/Code-Gen-Options.html I assume the issue with Julia is something similar. |
Thanks Viral. The Julia issue was JuliaLang/julia#5291 where we did switch to using cblas for these functions, the trouble is that BLIS doesn't recreate an exact copy of cblas. BLIS' C API is similar to cblas for this particular function, but more general. |
We can't use f2c with Julia, as we use the C calling convention for calling gfortran compiled libraries. |
That's probably for the best. f2c calling conventions are horrible. (I only mentioned it because the real/imaginary return value issue seemed familiar.) |
Co-authored-by: Michael Yeh <myeh@sw01.internal.sifive.com>
CPU: Core2 Duo E8400 (old machine)
OS: Ubuntu 14.04, x86-64
Compiled BLIS reference configuration, setting
BLIS_ENABLE_DYNAMIC_BUILD := yes
. By itself, BLIS passes its ownmake test
.I'm calling into BLIS from Julia by the following steps:
This gives me a different failure each time I repeat
make testall
. Here are some examples, from the first couple of files in Julia's test suite (linalg1 and linalg2):https://gist.github.com/47dbd5517c4a6f56fb2e
https://gist.github.com/51835795c8ada7c0f2a1
https://gist.github.com/552ec09e5e78d1cd3da7
https://gist.github.com/7cce0057bf9a3009a92a
https://gist.github.com/0f01364072634cc95be9
https://gist.github.com/9292fc3fe1a2e9afe09d
I'll see if I can translate a few of these test cases into C to figure out whether the problem is reproducible outside of Julia. I'll also try setting the BLIS integer size to 64-bit and delete the
USE_BLAS64 = 0
line, see whether that changes anything.The text was updated successfully, but these errors were encountered: