Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rabbitmq crashing with config_ranch = true #668

Closed
jni- opened this issue Nov 22, 2017 · 2 comments
Closed

Rabbitmq crashing with config_ranch = true #668

jni- opened this issue Nov 22, 2017 · 2 comments

Comments

@jni-
Copy link

jni- commented Nov 22, 2017

DISCLAIMER: this is a wild guess, I don't understand half of what I'm about to say. Don't take anything for granted.

We updated to v8.0.0 of this plugin and our server stopped accepting connections, producing 36Gb of crash logs in a few hours. The error looked like this :

=CRASH REPORT==== 22-Nov-2017::14:31:40 ===                         
  crasher:                                                                 
    initial call: rabbit_reader:init/2                                     
    pid: <0.6421.0>                        
    registered_name: []                                                    
    exception error: bad argument                                        
      in function  size/1                                                  
         called as size([65,77,81,80,0,0,9,1])                             
      in call from rabbit_reader:mainloop/4 (src/rabbit_reader.erl, line 318)
      in call from rabbit_reader:run/1 (src/rabbit_reader.erl, line 281)
      in call from rabbit_reader:start_connection/5 (src/rabbit_reader.erl, line 255)
    ancestors: [<0.6419.0>,rabbit_tcp_client_sup,rabbit_sup,<0.137.0>]     
    messages: [{'EXIT',#Port<0.16074>,normal}]                      
    links: [<0.6419.0>]                                                    
    dictionary: [{process_name,                                            
                      {rabbit_reader,          
                          <<"client:64288 -> server:5672">>}}]   
    trap_exit: true                                                      
    status: running                                                 
    heap_size: 610
    stack_size: 27
    reductions: 700
  neighbours:

=SUPERVISOR REPORT==== 22-Nov-2017::14:31:40 ===
     Supervisor: {<0.6419.0>,rabbit_connection_sup}
     Context:    child_terminated
     Reason:     badarg
     Offender:   [{pid,<0.6421.0>},
                  {name,reader},
                  {mfargs,{rabbit_reader,start_link,[<0.6420.0>]}},
                  {restart_type,intrinsic},
                  {shutdown,4294967295},
                  {child_type,worker}]

I tracked down the error to the generated config. We now had this in rabbitmq.config :

    {tcp_listen_options, [
         {backlog,       128},
         {nodelay,       true},
         {linger,        {true, 0}},
         {exit_on_close, false}
    ]},

Which was fixed by adding binary in it (like we used to have) :

    {tcp_listen_options, [
         binary,
         {backlog,       128},
         {nodelay,       true},
         {linger,        {true, 0}},
         {exit_on_close, false}
    ]},

From the changes in the PR below, I suspect this is happening because we have rabbitmq 3.3.5 (yes... I know. On erlang R16B03 moreover.), hence disabling ranch. Now config_ranch is true by default, regardless of the version. When we updated the plugin, our server went down on the next puppet apply.

I'm not sure what the good solution is. Always include binary regardless of "ranch" (I could open a PR), or put config_ranch = false on our side. In any case, I think this should either be better documented or be version-dependent. Again this could all stem from me not knowing what I'm currently doing.

Relates to this PR : #621
And to these lines :

{tcp_listen_options, [
<%- unless @config_ranch -%>
binary,

Thank you!

@wyardley
Copy link
Contributor

@jni- Yes, I think your analysis is correct.
If you notice, the decision we made in that PR (and the reason we labeled it breaking) was to err on the side of consistency / idempotency vs. being smart about version. Since it would take potentially at least one run for the rabbitmq_version fact to exist, the earlier implementation of this was causing the module to be non-idempotent.

So while this is unfortunate, for folks using the older versions, it does make the module idempotent in its default configuration under most (all) valid configs, where it will load a later RabbitMQ version. So it was an intentional decision to leave it broken with the default setting for older versions of RMQ. The module still works, but that parameter will need to be set.

Sorry about all the logs etc., though!

@jni-
Copy link
Author

jni- commented Nov 22, 2017

@wyardley excellent, thank you for the quick reply! I understand that dragging down current code with old versions is a pain, and you should absolutely not do it hehe I'll fix this on our side.

I think the breaking change message could be improved a little. May I suggest log messages that are user-oriented instead of effect-oriented or code-oriented. I understand the idempotency concerns, but I think there is a difference between the change that is "Fixes idempotency by adding a parameter config_ranch that is not version dependent" and the breaking change that is "If using rabbitmq < 3.6, you will need to set config_ranch = false to have the same behavior as the previous version". I don't know if this is possible, just a thought!

In any case, you can close this issue, thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants