Skip to content

Native crash when producer is closed #7

Closed
@Pchelolo

Description

@Pchelolo

Here's a relevant part of the crash report:

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000

VM Regions Near 0:
--> 
    __TEXT                 00000001022f8000-00000001032f9000 [ 16.0M] r-x/rwx SM=COW  /Users/USER/*

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libc++abi.dylib                 0x00007fff9c7afb1e __dynamic_cast + 34
1   node-librdkafka.node            0x00000001044a1bea RdKafka::Topic::create(RdKafka::Handle*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, RdKafka::Conf*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) + 232 (TopicImpl.cpp:102)
2   node-librdkafka.node            0x000000010448fb0e NodeKafka::Topic::New(Nan::FunctionCallbackInfo<v8::Value> const&) + 1380 (topic.cc:33)
3   node-librdkafka.node            0x000000010448ff4d Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&) + 131 (nan_callbacks_12_inl.h:175)
4   node                            0x000000010245f8b5 v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) + 373
5   node                            0x00000001024b901f v8::internal::Builtin_HandleApiCallConstruct(int, v8::internal::Object**, v8::internal::Isolate*) + 1375
6   ???                             0x000004463bb0961b 0 + 4699695650331

Unfortunately I couldn't create a small tests case to reliably reproduce, but the cause seem to be quite clear. When produce and disconnect calls are placed with very unlucky timing, we've got a race:

  1. EvenThread: disconnect called, ProducerDisconnect work is placed.
  2. EventThread: produce called - since we're updating the _isConnected only after the disconnect happened, the produce call goes to maybeTopic and then to native code and gets all the way till here
  3. ProducerDisconnect work executed on background thread and deletes the m_client
  4. EventThread: calls RdKafka::Topic::create -> boom, SIGSEGV

I think the easiest solution is to update JS _isConnected property before the actual disconnect happens - then all these racy code paths will be protected. However, maybe it's better to invest time in fixing those races on the native level, not sure which path do you wanna choose.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions