Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-[MQTTDecoder stream:handleEvent:] crashing with EXC_BAD_ACCESS KERN_INVALID_ADDRESS #325

Open
repomies opened this issue Jun 18, 2017 · 29 comments
Labels

Comments

@repomies
Copy link

Over the past days I've seen a few crashes like this for my app in Crashlytics. Any ideas what might be going wrong? Using MQTTClient 0.9.2.

Crashed: com.apple.main-thread
0  libobjc.A.dylib                0x18b3fc150 objc_msgSend + 16
1  MQTTClient                     0x1009f62b8 -[MQTTDecoder stream:handleEvent:] (MQTTDecoder.m:132)
2  CoreFoundation                 0x18c8d201c _signalEventSync + 212
3  CoreFoundation                 0x18c8d1f28 _cfstream_shared_signalEventSync + 440
4  CoreFoundation                 0x18c93542c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
5  CoreFoundation                 0x18c934d9c __CFRunLoopDoSources0 + 540
6  CoreFoundation                 0x18c9329a8 __CFRunLoopRun + 744
7  CoreFoundation                 0x18c862da4 CFRunLoopRunSpecific + 424
8  GraphicsServices               0x18e2cc074 GSEventRunModal + 100
9  UIKit                          0x192b1d058 UIApplicationMain + 208
10 MyApp                          0x1001970e8 main (AppDelegate.swift:20)
11 libdyld.dylib                  0x18b87159c start + 4
@repomies
Copy link
Author

repomies commented Jul 2, 2017

The one time so far I've managed to reproduce this on an Xcode debug, I got the following as the tail of the console log. I added one more debug print on line 130 of MQTTDecoder.m to print the dataBuffer as a UTF-8 string (prefixed with [MQTTDecoder] data: in the log), and looks like the problematic data was cting.. My guess is the backend is doing something seriously wrong here, but it'd be nice if the iOS client didn't crash on it.

2017-07-02 13:53:16.261999+0300 MyApp[61535:16904187] [MQTTDecoder] received (54)=<31343938 39393237 38383a20 536f636b 65742065 72726f72 206f6e20 636c6965 6e74203c 756e6b6e 6f776e3e 2c206469 73636f6e 6e65>...
2017-07-02 13:53:16.262015+0300 MyApp[61535:16904187] [MQTTDecoder] data: 1498992788: Socket error on client <unknown>, disconne
2017-07-02 13:53:16.262034+0300 MyApp[61535:16904187] [MQTTSession] other state
2017-07-02 13:53:16.262048+0300 MyApp[61535:16904187] [MQTTDecoder] NSStreamEventHasBytesAvailable
2017-07-02 13:53:16.262257+0300 MyApp[61535:16904187] [MQTTDecoder] fixedHeader=0x63
2017-07-02 13:53:16.262724+0300 MyApp[61535:16904187] [MQTTDecoder] digit=0x74 0x74 0 1
2017-07-02 13:53:16.262829+0300 MyApp[61535:16904187] [MQTTDecoder] remainingLength=116
2017-07-02 13:53:16.262862+0300 MyApp[61535:16904187] [MQTTDecoder] read 116 116
2017-07-02 13:53:16.262975+0300 MyApp[61535:16904187] [MQTTDecoder] received (118)=<6374696e 672e0a31 34393839 39323738 383a204e 65772063 6f6e6e65 6374696f 6e206672 6f6d2031 37382e35 352e312e 3138206f 6e20706f 72742031 3838332e 0a313439 38393932 3738383a 204e6577 20636c69 656e7420 636f6e6e 65637465 64206672 6f6d2031 37382e35 352e312e 31382061 7320>...
2017-07-02 13:53:16.263031+0300 MyApp[61535:16904187] [MQTTDecoder] data: cting.
1498992788: New connection from 178.55.1.18 on port 1883.
1498992788: New client connected from 178.55.1.18 as
2017-07-02 13:53:16.263055+0300 MyApp[61535:16904187] [MQTTMessage] unexpected payload after packet identifier
2017-07-02 13:53:16.263221+0300 MyApp[61535:16904187] [MQTTSession] MQTT illegal message received
2017-07-02 13:53:16.263238+0300 MyApp[61535:16904187] [MQTTSession] closeInternal
2017-07-02 13:53:16.263308+0300 MyApp[61535:16904187] [MQTTCFSocketTransport] close

@ithaqua
Copy link

ithaqua commented Jul 6, 2017

I have met the same issue, any update?

@pablogeek
Copy link

Same issue, how is this going?

@ckrey
Copy link
Contributor

ckrey commented Jul 7, 2017

the cting. is just the continuation of the data sent before ...disconne

What broker are you using? It seems the broker is sending mosquitto debug messages to the mqtt connection. This is not MQTT protocol.

The protocol violation is detected. The crash is probably caused by data already received which shouldn't be processed anymore after detecting the protocol violation

@repomies
Copy link
Author

repomies commented Aug 1, 2017

Hi @ckrey and sorry for the late reply — the backend is using Mosquitto (https://github.com/toke/docker-mosquitto) as far as I know. I will try to find out more once the backend developers are back from vacations :)

Strangely this bug peaked in the analytics at the latter half of June, then was mostly gone for most of July, until another peak over the very last few days. App usage has been quite constant over this period so it's a bit puzzling what exactly is causing it.

@rockyftn
Copy link

rockyftn commented Aug 9, 2017

I have same issue on my side. Also i use Mosquito broker on other side. Do you have update or advice what to do?

@repomies
Copy link
Author

I might have figured out why this happened. I called close() on my MQTTSession right before it was deallocated, which somehow led to MQTTDecoder getting deallocated right after it called decoder:didReceiveMessage: and crashing on next line's self.dataBuffer = nil;. Waiting for closeWithDisconnectHandler to return the callback before deiniting the session seems to have fixed this as far as I can tell.

@repomies
Copy link
Author

Still saw the same crash at least once (though only on a non-debug build so far) despite the fix mentioned above. But it seems to happen much more rarely now, if anything.

@repomies
Copy link
Author

repomies commented Sep 3, 2017

Update: The "fix" I mentioned above didn't end up solving the problem, and if anything we actually saw the crashes peak over the past week or two. The root cause of the problem definitely seems to be some garbage (Mosquitto debug logs) being pushed into the pipeline by the backend (unfortunately I can't do anything about that). I tried a couple of other iOS MQTT client libraries and they actually dealt way worse with this problem, crashing right away when connecting, or never managing to parse any data.

However, it seems – and this still needs more data points to make sure but at 40% adoption and zero crashes so far, it sure looks promising – that my latest update got rid of the crashes simply by holding onto each MQTTSession instance for 30 more seconds after it disconnects before allowing it to be deallocated. The garbage in the data pipeline remains, but this way it doesn't seem to lead to the memory problems. (I'll post another update after I've seen whether the crashes truly are gone now.)

@repomies
Copy link
Author

repomies commented Sep 6, 2017

Yes, this time the crashes definitely seem to be gone for good. Still zero crashes reported for the newest app version after five days (we used to have at least a dozen every day).

@pablogeek
Copy link

I have the same crash in Fabric. Any updates on this? @repomies

@ckrey ckrey added the bug label Sep 26, 2017
@repomies
Copy link
Author

@pablogeek yes, looks like in my case it completely disappeared after I deployed the fix described above – just holding onto the MQTTSession instance for extra 30 seconds after disconnecting it (and probably that's much longer than needed) so that the MQTTDecoder instance doesn't end up deallocated when it's still trying to process an invalid message.

@sharonSilicus
Copy link

@repomies I am also facing the same issue. How do you hold the session after disconnection? A sample code would help. Thanks.

@ckrey
Copy link
Contributor

ckrey commented Oct 26, 2017

Can you describe what you are doing with MQTT? Are you subscribing to a topic with a lot of large retained messages and QoS0, which causes the broker to flood the client with data?
Or do you use QoS1/2 but do not have specified a reasonable max_inflight_messages.

I do not want to say the library is error-free, but under "normal" circumstances it does behave quite stable

@sharonSilicus
Copy link

@ckrey While the application is moving to background, I close the connection to the MQTT server its an Azure IoT Hub. While coming back to foreground, I try to connect to MQTT server again. Once in a couple of days the crash appears.

- (void)applicationDidEnterBackground:(UIApplication *)application {
    // Use this method to release shared resources, save user data, invalidate timers, and store enough application state information to restore your application to its current state in case it is terminated later.
//     If your application supports background execution, this method is called instead of applicationWillTerminate: when the user quits.
    MQTTSession *session = [[MQTTSessionManager sharedManager] session];
    [session closeAndWait:1];
}
- (void)applicationDidBecomeActive:(UIApplication *)application {
    // Restart any tasks that were paused (or not yet started) while the application was inactive. If the application was previously in the background, optionally refresh the user interface.
    
    MQTTSession *session = [[MQTTSessionManager sharedManager] session];

    if ([session status] == MQTTSessionStatusClosed ) {
     
        NSString *clientID = [ApplicationManager sharedManager].clientID;
        NSString *sas = [ApplicationManager sharedManager].sasKey;
        NSString *hostURL = [ApplicationManager sharedManager].hostURL;
        
        [[MQTTSessionManager sharedManager] setHost:hostURL];
        [[MQTTSessionManager sharedManager] setClientID:clientID];
        
        [[MQTTSessionManager sharedManager] connectWithUserName:[NSString stringWithFormat:@"%@/%@",hostURL,clientID] andPassword:sas];
    }
    
}

@jcavar
Copy link
Contributor

jcavar commented Oct 31, 2017

Relevant probably:
facebookincubator/SocketRocket#532

@jcavar
Copy link
Contributor

jcavar commented Nov 14, 2017

I believe this bug is fixed on master:
https://github.com/novastone-media/MQTT-Client-Framework/blob/master/MQTTClient/MQTTClient/MQTTCFSocketTransport.m#L39-L46

We had to move to using queues instead of run loop which is probably better anyway. There are some bits to resolve before making official release, but feel free to try it out from master.

@jcavar
Copy link
Contributor

jcavar commented Dec 14, 2017

Ok, 0.10.0 version is on CocoaPods and this issue should be fixed I believe. Please give it a try.

@jcavar jcavar closed this as completed Dec 14, 2017
jcavar added a commit that referenced this issue Apr 25, 2018
This is related to previous crash:
#325
@saberChen
Copy link

Using MQTTClient 0.12.0, still many crashes like this for my app, [MQTTDecoder stream:handleEvent:] @jcavar

@jcavar jcavar reopened this May 14, 2018
@jcavar
Copy link
Contributor

jcavar commented May 14, 2018

Do you have any idea how it happens?

@marciniwanicki
Copy link

I think the crash is related to a synchronization issue in close method (MQTTDecoder.m:88) leading to a deadlock. I believe most of the clients experience it on the main thread as it's the default configuration.

 if (self.queue != dispatch_get_specific(&QueueIdentityKey)) {
        dispatch_sync(self.queue, ^{ // deadlock
            [self internalClose];
        });
    }

Existing implementation seems to relay on the assumption that two different queues cannot perform blocks on the same thread. This assumption is not correct as GCD may skip switching threads when it's applicable (for performance reasons).

Here is a test that should reproduce it.

#import <XCTest/XCTest.h>
#import "MQTTDecoder.h"

@interface MQTTDecoderTests : XCTestCase

@end

@implementation MQTTDecoderTests

- (void)testCloseFromNonMainDispachQueueOnMainThread {
    MQTTDecoder *sut = [[MQTTDecoder alloc] init];
    dispatch_queue_t customQueue = dispatch_queue_create("mqttdecodertests.queue", nil);

    dispatch_sync(customQueue, ^{
        [sut close];
    });

    XCTAssertTrue(YES); // test passed if this line is reached
}

@end

The fix is not trivial as the MQTTDecoder using externally available queue. My naive fix would at least make some cases working, and throw exceptions if the code cannot be performed synchronously in a safe way.

- (void)close {
    BOOL dispatchedOnDecoderQueue = self.queue == dispatch_get_specific(&QueueIdentityKey);
    BOOL calledOnMainThreadAndDecoderUsingMainQueue = [NSThread isMainThread] && self.queue == dispatch_get_main_queue();
    BOOL canSafelyInlineInternalClose = dispatchedOnDecoderQueue || calledOnMainThreadAndDecoderUsingMainQueue;
    if (!canSafelyInlineInternalClose) {
        @throw [NSException exceptionWithName:NSInternalInconsistencyException
                                       reason:@"Cannot close MQTTDecoder synchronously"
                                     userInfo:nil];
    }

    [self internalClose];
}

Maybe we could improve the fix by replacing the exception with a code niling the delegate and dispatching internalClose asynchronously? I am not sure if that would have any bad implications.

@nnmanhit
Copy link

nnmanhit commented Aug 30, 2018

This is still happen because QueueIdentityKey = NULL.

I have fixed by this way, and it works.

@interface MQTTDecoder() {
    void *QueueIdentityKey;
}

@property (nonatomic) NSMutableArray<NSInputStream *> *streams;
@property (readonly) void *QueueIdentityKey;

@end
- (void *)QueueIdentityKey {
    return QueueIdentityKey;
}
- (void)setQueue:(dispatch_queue_t)queue {
    _queue = queue;
    
    // We're going to use dispatch_queue_set_specific() to "mark" our queue.
    // The dispatch_queue_set_specific() and dispatch_get_specific() functions take a "void *key" parameter.
    // Later we can use dispatch_get_specific() to determine if we're executing on our queue.
    // From the documentation:
    //
    // > Keys are only compared as pointers and are never dereferenced.
    // > Thus, you can use a pointer to a static variable for a specific subsystem or
    // > any other value that allows you to identify the value uniquely.
    //
    // So we're just going to use the memory address of an ivar.
    
    QueueIdentityKey = &QueueIdentityKey;
    dispatch_queue_set_specific(_queue, QueueIdentityKey, QueueIdentityKey, NULL);
}
- (void)close {
    // https://github.com/novastone-media/MQTT-Client-Framework/issues/325
    dispatch_block_t block = ^{
        [self internalClose];
    };
    
    if (dispatch_get_specific(QueueIdentityKey))
        block();
    else
        dispatch_sync(_queue, block);
}

@jcavar
Copy link
Contributor

jcavar commented May 31, 2019

We merged this a while ago I think. Is this issue still happening?

@repomies
Copy link
Author

I’m sorry I can’t answer for myself, since our app’s backend infrastructure unfortunately no longer exists.

@avichayVonage
Copy link

This issue also happened in my application lately, using version 0.15.2

@jcavar
Copy link
Contributor

jcavar commented Jun 22, 2019

Can you provide more information about how it happens (logs, conditions, steps)?

@jcavar
Copy link
Contributor

jcavar commented Oct 15, 2019

#526

robnadin pushed a commit to robnadin/MQTT-Client-Framework that referenced this issue Oct 17, 2019
@shashidj
Copy link

Crashed: com.apple.main-thread
0 libsystem_kernel.dylib 0x1a5cc7db8 close + 8
1 CoreFoundation 0x1a5e6671c CFSocketInvalidate + 820
2 CFNetwork 0x1a92734ec _CFStreamErrorFromCFError + 158756
3 CoreFoundation 0x1a5e708cc _CFStreamClose + 108
4 MQTTClient 0x105c7e134 -[MQTTCFSocketDecoder close] + 60
5 MQTTClient 0x105c7f6d4 -[MQTTCFSocketTransport internalClose] + 244
6 MQTTClient 0x105c8c718 -[MQTTSession closeInternal] + 332
7 MQTTClient 0x105c91130 -[MQTTSession error:error:] + 224
8 MQTTClient 0x105c90e80 -[MQTTSession connectionError:] + 60
9 MQTTClient 0x105c7f880 -[MQTTCFSocketTransport encoder:didFailWithError:] + 104
10 MQTTClient 0x105c7e90c -[MQTTCFSocketEncoder stream:handleEvent:] + 508
11 MQTTClient 0x105c988f8 -[MQTTSSLSecurityPolicyEncoder stream:handleEvent:] + 124
12 CoreFoundation 0x1a5e746ac _signalEventSync + 216
13 CoreFoundation 0x1a5e747b8 ___signalEventQueue_block_invoke + 28
14 libdispatch.dylib 0x1a5b7eec4 _dispatch_call_block_and_release + 32
15 libdispatch.dylib 0x1a5b8033c _dispatch_client_callout + 20
16 libdispatch.dylib 0x1a5b8c600 _dispatch_main_queue_callback_4CF + 832
17 CoreFoundation 0x1a5e5b41c CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE + 16
18 CoreFoundation 0x1a5e56034 __CFRunLoopRun + 1708
19 CoreFoundation 0x1a5e55660 CFRunLoopRunSpecific + 480
20 GraphicsServices 0x1b0266604 GSEventRunModal + 164
21 UIKitCore 0x1aa02a15c UIApplicationMain + 1944
22 HubbleConnected 0x1028d0988 main + 14 (main.m:14)
23 libdyld.dylib 0x1a5cd11ec start + 4

Still we are getting this crash in iOS

@pantaopt
Copy link

pantaopt commented Aug 6, 2020

I think should avoid doing some potentially risky things in the dealloc method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests