-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sometimes sending to a Channel will segfault #4748
Comments
Another one... As the exact place of the segfault seems to be random, I have a theory that the GC attempts to free the pointer to my shared channel that I passed in in the thread args. Although I have no idea why, because it was allocated using sharedAlloc0....
|
Btw, compiling with -d:useSysAssert and -d:useGcAssert will make the code fail immediately. Although I'm not too sure if these errors are valid or not because I'm getting
|
I've had similar issues on OS X before I realized that you mustn't share channels as objects between threads. Make sure the channel is either in a global variable or passed as a You can find an example of how to properly pass Channel as an argument in #4689. |
Thanks man, I appreciate your help. I was aware of the issues you described and checked out your code example. I was already using channels allocated on the shared heap and I was passing only pointers to the thread proc. In my desperation I tried to create the threads like in your example (currently I'm just creating normal GC'd objects and I'm passing the pointer, that should work and it does). To my surprise, then the whole thing consistently starting crashing right at startup on all platforms, but to my understanding it should not make any difference at all Anyway, I'm totally confused now, if you could take a look at my code I'd appreciate it, maybe you can spot something that's wrong. https://github.com/johnnovak/nim-raytracer/blob/master/src/workerpool.nim To run the tests that sometimes crash on Windows:
|
It seems you do copy Thread object and that's what causes these problems. Try either changing |
Thanks for the suggestion, I have changed I suspect there's something wrong with the |
well without any code snippet I cannot fix anything. |
Hi Araq, you can have a look at the whole source here. Since you would have to debug it anyway, I don't think there's much point in trying to slim it down. I also have a suspicion that maybe I'm using too many features that aren't really compatible with each other... https://github.com/johnnovak/nim-raytracer/ You can also run the tests that will fail randomly on Windows:
Actually, I would really appreciate it if you could tell me whether I'm doing something stupid in there that is guaranteed to blow up... I'm especially suspicious about those two closures I used for the callback procs (just search for UPDATE: I just ran the tests on OS X at work and sometimes it segfaults inside the closures:
Still a bit strange that it crashed with a different error on different operating systems. Anyway, any advice is appreciated, I've been struggling with this for days now... Thanks! EDIT: fixed repo URL |
Oh yeah, right now the |
Ah, that explains the random crashes then. Thanks for the info, I'm looking forward to the fix. Hopefully that will fix all my problems. |
@Araq Just wondering, any updates on this? |
I wanted to retest on Windows, sorry. Can you retest? Lots of bugs wrt channels have been fixed for 0.15. |
I could only retest on OS X at the moment because I'm at work, but the issue still persists with the latest dev version. Just keep rerunning the tests as described in #4748 (comment) Some test runs will succeed, but sooner of later you'll get random errors as described in the above comment (basically, it still fails inside the closure). But as you hinted earlier, this might not even be a channel bug, more like a GC/thread bug:
So this definitely needs more work... EDIT: I've removed "on Windows" from the title because it's actually happening on all platforms (just a bit more likely on Windows). |
Oh no, that particular bug was fixed too. Ah, yes, channels do predate 'closure', that could explain everything. :-) |
So are you gonna fix it? :) EDIT: Btw, there's definitely some improvement with the latest dev version. I had to use |
Allocator and channels bugfixes mean this could now work. Please retest. |
Issue is still there:
It would be best if you tested this with the code I provided earlier, then you can replicate it exactly and debug it. |
@johnnovak Could you please duplicate the link you provided? There are multiple links above and I'm not sure which one is up to date. The error you show is different from what was there in the past. At the moment, judging by the stacktrace, the problem seems to be that you use |
@endragor You can find all the info on how to replicate this in this post: #4748 (comment) If you check out that post you'll see that message I got now is the same to the letter. But yeah, sometimes I was able to get different messages randomly, you're right in that... The workerpool does not only fail in the unit tests, but also in actual use sometimes. So I doubt that it's only an unit test issue. Anyway, it would be good if you could have a look at the code as well, maybe you'll be able to spot something. I spent weeks on this and I just gave up at this point... I figured the language is just broken so I'll wait until it gets fixed properly. |
@johnnovak Could you try to redesign your tests so that all calls to unittest module are only performed in the main thread? I've looked a bit more at it, and unittest is simply not designed to be invoked for multiple threads, and I think that makes sense. So best we can do is raise a proper error if unittest is invoked from non-main thread. That seems to be the only problem left, because there seems to be no more corruption crashes like you had in the past: channels.nim(202) send
channels.nim(166) rawSend
alloc.nim(650) alloc0
alloc.nim(644) alloc
alloc.nim(481) rawAlloc
alloc.nim(437) getSmallChunk
alloc.nim(428) getBigChunk
alloc.nim(399) splitChunk
SIGSEGV: Illegal storage access. (Attempt to read from nil?) (this is the stacktrace from the description of the issue). But you'll know for sure when unittest is only used from main thread. |
Oops, I think, I get the same problem on windows. Preparing the test...
|
@vegansk Have you had any success in isolation the issue? |
No feedback. Closing. |
I had a similar error involving sendImpl -> rawSend -> storeAux leading to |
This PR revives nim-lang#8611 since the channels module still does not have an example of usage. I attempted to fix the outstanding issue that caused the previous PR to be closed, namely implementing the example on top of threads instead of threadpool. I also added an additional example of using sharedAlloc0 to create channel pointers based on the discussion in nim-lang#4748. Please let me know if there is anything that needs to be changed!
This is happening quite regularly on Windows and the results are catastrophic... I'm actually using a shared channel for this, so I guess it's either a bug in the channel implementation or maybe a GC bug? I'm sending to my shared channel from multiple threads, my understanding is that that's OK as channels use locking internally. The same code has been working flawlessly on Linux and OS X so far.
I am happy to share my full source with the brave person who's willing to investigate this. Isolating the root cause would be extremely hard as I have really no idea where the problem is coming from, and changing anything might actually make it go away.
The text was updated successfully, but these errors were encountered: