Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

Slack RTM retry/reconnect not working #261

Closed
garymoon opened this issue Jun 4, 2016 · 15 comments
Closed

Slack RTM retry/reconnect not working #261

garymoon opened this issue Jun 4, 2016 · 15 comments

Comments

@garymoon
Copy link

garymoon commented Jun 4, 2016

Hi,
I've found that the RTM reconnect logic isn't reached when the RTM connection dies for one reason or another.

Version: 0.1.2
OS: Ubuntu 15.04
Node: v6.1.0

To test I have done the following:

  • Set retry to 5 in my controller config
  • console.log'd the retryEnabled var in Slackbot_worker.js to ensure it was set
  • console.log'd at the start of reconnect()
  • tcpkill'd the RTM connection
  • Confirmed nothing was logged to the console

This follows a series of less drastic actions to confirm it wasn't working.

It seems the only path to the retry logic is at the bottom of bot.closeRTM when an err object is passed. This code is not reached from the error or close event of bot.rtm which is the path followed when the connection dies unexpectedly. I am unsure whether or not it is desirable to call reconnect() from close, so I've not submitted a PR. I'm happy to once it's been clarified what would be desired. As a user I would probably expect the retry logic to occur on a team_migration_started event also, though I suspect that would be be immediately followed by a close event as per the Slack documentation.

Thank you for botkit, it's great stuff!

Cheers,
Gary.

@jayjanssen
Copy link
Contributor

Hi, I have been having issues with my slack connection just going dark as well and with your suggestion of tcpkilling the connection, I found a few issues that I tried to fix here:

#298

I'm not sure if it addresses your specific issue or not, but once I was logging properly (#299), I was able to see slackbot do reconnection attempts with tcpkill.

@DustinVenegas
Copy link

Ah, this looks like the issue I'm having upon hitting a debug breakpoint. I don't know enough about @jayjanssen's PR but it definitely touches what I'm seeing go wrong with reconnects.

@rdohms
Copy link

rdohms commented Nov 7, 2016

I'm using v0.4 that includes the fix above, but also found this in logs:

error: Error: Stale RTM connection, closing RTM

After this no messages reach slack and the bot is offline, so the issue is still there.

@garymoon
Copy link
Author

garymoon commented Nov 7, 2016

For anyone looking for a reliable solution, I have auto-reconnect disabled and use the following code to keep the bot alive. It hasn't failed once since my last reply here.

var controller = ...;
var bot = ...;

function start_rtm() {
        bot.startRTM(function(err,bot,payload) {
                if (err) {
                        console.log('Failed to start RTM')
                        return setTimeout(start_rtm, 60000);
                }
                console.log("RTM started!");
                });
        });
}

controller.on('rtm_close', function(bot, err) {
        start_rtm();
});

start_rtm();

Hope it helps!

Edit: There's no need to handle team_migration_started, it's almost immediately followed by an rtm_close, and the reconnection logic will (within a minute or two, since the first reconnect attempt usually fails) connect you to the new server.

@rickul
Copy link

rickul commented Nov 9, 2016

@garymoon How do I disable auto-reconnect? bot.config.retry = false?

@garymoon
Copy link
Author

garymoon commented Nov 9, 2016

@rickul It's disabled by default, I just meant I don't have it enabled when I use the code above.

@rickul
Copy link

rickul commented Nov 9, 2016

Thanks! Your code works fine 👍

@peterswimm
Copy link
Contributor

Closed as answered

@garymoon
Copy link
Author

Probably best to remove the feature if there's no interest in fixing it @peterswimm.

@peterswimm
Copy link
Contributor

@garymoon with the shift from rtm to events you may see a deprecation of this feature, but if anyone has a PR that improves functionality of auto-reconnect we will definitely consider it.

@garymoon
Copy link
Author

@peterswimm My understanding is that events and RTM are intended to compliment one another, indeed each has several features not supported by the other.

It seems that silent disconnects are causing some confusion for those using RTM with botkit (myself included). With some guidance on what would be expected of a PR to resolve this issue (assuming transplanting my solution above into the library itself is undesirable) I believe I'd be able to submit a working PR.

@xuanvinh2005
Copy link

xuanvinh2005 commented Dec 1, 2016

I tried @garymoon code and I went to a issue, at some points it started twice and two websockets were opened. I didn't enable "retry", looks like it goes to this code
bot.rtm.on('close', function(code, message) { botkit.log.notice('RTM close event: ' + code + ' : ' + message); if (pingIntervalId) { clearInterval(pingIntervalId); } ... if (code === 1006) { botkit.log.error('Abnormal websocket close event, attempting to reconnect'); reconnect(); } });
it calls reconnect function without checking retry and @garymoon called the same thing rtm_close. not sure if I'm correct and how to fix this one. Also don't know why lastpong was not updated

@inn0v8
Copy link

inn0v8 commented Dec 8, 2016

@xuanvinh2005 After struggling with this, I found this bug and created this PR

#532

Very simple change from bot to botkit. Reconnect has been working flawlessly since. Also using the following code to notify me in slack of reconnects, drops, and other slack RTM messages.

controller.on('rtm_close',function(bot) {
  console.log('\n\n*** '+moment().tz('America/Los_Angeles').format()+'  ** The RTM api just closed');

  try {
    controller.findTeamById('<SLACK_TEAM_ID_HERE>', function(err, team) {
      var alertBot = controller.spawn(team);
      alertBot.api.chat.postMessage(
        {
          channel : '<SLACK_CHANNEL_ID_HERE>',
          text: 'GUYS!!! I just got an RTM Close event trigger for this team --> '+bot.team_info.id+'  '+bot.team_info.name+' .',
          attachments: [],
        }, function (err, response) {
      });
    });
  } catch (err) {
    console.log('\n\n*** '+moment().tz('America/Los_Angeles').format()+' Tried to send alert for an rtm_close event but failed with this error: '+err);
  }
});

controller.on('rtm_reconnect_failed',function(bot) {
  console.log('\n\n*** '+moment().tz('America/Los_Angeles').format()+'  ** Unable to automatically reconnect to rtm after a closed conection.');


  try {
    controller.findTeamById('<SLACK_TEAM_ID_HERE>', function(err, team) {
      var alertBot = controller.spawn(team);
      alertBot.api.chat.postMessage(
        {
          channel : '<SLACK_CHANNEL_ID_HERE>',
          text: 'GUYS!!! I just got an RTM Reconnect Failed trigger for this team --> *'+bot.team_info.id+'*  name: *'+bot.team_info.name+'*. The team either uninstalled the app or there was an issue which requires a resart of the bot.',
          attachments: [],
        }, function (err, response) {
      });
    });
  } catch (err) {
    console.log('\n\n*** '+moment().tz('America/Los_Angeles').format()+' Tried to send alert for an rtm_reconnect_failed event but failed with this error: '+err);
  }
});

@xuanvinh2005
Copy link

About retry stuff, I have workaround by setting retry when calling spawn instead of in botkit config,
var bot = controller.spawn({ token: token, retry: 10 });

however, my issue is the same with @rdohms's error: Error: Stale RTM connection, closing RTM
bot was died without reconnect, I tried to increase ping/pong check time from 1200 to bigger value, it seems to be fine

@inn0v8
Copy link

inn0v8 commented Dec 8, 2016

Had the same issue but once I fixed the bug in Slackbot_worker.js it has been working fine and haven't had any issues in the last month.

Just to make sure, I logged the status of the retryEnabled attribute in the closeRTM function in Slackbot_worker.js. With the above change, it was always true for the stale connection error and haven't had any issues since. Hope this helps.

    bot.closeRTM = function(err) {
      console.log('\n\n **** THE closeRTM FUNCTION WAS JUST CALLED AND THIS IS THE ERROR --> '+err+' AND RETRY IS --> '+retryEnabled);
        if (bot.rtm) {
            bot.rtm.removeAllListeners();
            bot.rtm.close();
        }

        if (pingIntervalId) {
            clearInterval(pingIntervalId);
        }

        lastPong = 0;
        botkit.trigger('rtm_close', [bot, err]);

        // only retry, if enabled, when there was an error
        if (err && retryEnabled) {
            reconnect();
        }
    };

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants