Add sleep 10 before cu.commit #140

jpvrenen · 2016-05-24T13:58:53Z

No description provided.

vnitinv · 2016-05-25T09:19:59Z

@stacywsmith @dgarros Can you please review this pull request.

dgarros · 2016-05-26T05:30:11Z

@jpvrenen I understand we are facing an issue with Dual RE system but it looks like this patch will affect all type of junos devices and increase the commit time by 10s,
Is my understanding correct ?

Set to yes in a Dual RE scenario when 'commit sync' is enabled in the config. (set system commit synchronize)

jpvrenen · 2016-05-26T08:06:10Z

@dgarros Yes from my point of view this seem to happen in a DualRE scenario, when a 'valid' lock can exist that needs time to 'end' before 'commit'. but only when 'commit sync' is enabled in the configuration, pushing to single RE systems don't have this issue.

I have added a 'dual_re' option/parameter.

dgarros · 2016-06-06T05:57:22Z

Hi @jpvrenen,

Sorry it's taking so long but to be honest I'm not sure if this is the right approach to solve the problem.
What if in some cases even a 2 RE system require more than 10s or less than 10s.

I was thinking maybe we should just have an option to add a delay before doing the commit
Something like

wait_before_commit:
        description:
             - Indicate how many seconds the module should wait before commiting the changes
         required: false
         default: 0

Any other ideas are welcome

@vnitinv @stacywsmith any thoughts ?

jpvrenen · 2016-06-10T13:57:35Z

Hi @dgarros ,
Yes, thought about that as well as another possible solution only thing that it introduces is 'how' would one determine how many seconds delay should be used for this option to be effective. I must admit it is more flexible to add the 'wait_before_commit' as an option.

Ideally it would be best to 'know' if some (valid) lock condition still exists because of the 'commit check' command before running the actual 'commit'.

stacywsmith · 2016-06-10T16:23:20Z

@jpvrenen @dgarros @vnitinv

I've been looking into this. I feel it's a Junos bug, and I'll continue to look into it further and file a Junos PR. That said, this problem exists on deployed code and it would be best to have the Ansible module deal with it more gracefully.

I suggest that we wrap the "cu.commit(**opts)" in a try/except block. If an error is raised and the error message contains "remote lock-configuration failed on" then set "opts['force_sync'] = True" and retry the "cu.commit(**opts)". If the commit still fails with sync_force, then just raise the exception.

Any thoughts on this approach? If others agree with the general approach, I'll get this coded and submit a pull request.

jpvrenen · 2016-06-13T07:37:44Z

@dgarros @vnitinv @stacywsmith

Hi,

Using try/except sounds like a good idea, however instead of using the force option why not use the timeout option instead?

Basically you will be 'forcing' a configuration that has a lock condition which exitsts becaue of the 'commit check' issued in the first place. This condition will resolve automatically after the commit check has completed normally on the second RE. Issueing the 'force' option will also cause any lock conditions (that don't resolve automatically) to be ignored.

Please see below from the Juniper documentation about the 'force' option.

http://www.juniper.net/documentation/en_US/junos15.1/topics/concept/junos-software-configuration-routing-engines-synchronization-overview.html
" When you issue the commit synchronize command with the force option from one Routing Engine, the configuration sessions on the other Routing Engine will be terminated and its configuration synchronized with that on the Routing Engine from which you issued the command."

&&

" Note: We recommend that you use the force option only if you are unable to resolve the issues that caused the commit synchronize command to fail."

stacywsmith · 2016-06-13T22:25:30Z

@jpvrenen

It appears the time it takes this lock condition to resolve is related to the size of the configuration being checked/committed. I don't think a single static value can cover a very large configuration without sleeping for some large amount of time that is wasteful in most other conditions.

Maybe we can do this in a loop that uses an exponential back off until the commit succeeds or some timeout occurs (but even then the ultimate timeout will be somewhat arbitrary).

jpvrenen · 2016-06-14T13:04:10Z

@stacywsmith
Well now you mention it I was looking more closely at the responses see below:

May 23 10:07:16 [4601] Outgoing:
May 23 10:07:16 [4601] Outgoing:
May 23 10:07:16 [4601] Outgoing:
May 23 10:07:16 [4601] Outgoing: ]]>]]>

There appears to be only 1 'ok' after both RE config checks are already completed. Then right after the 'ok' the 'commit' is given:
May 23 10:07:16 [4601] Incoming: <?xml etc etc /nc:rpc]]>]]>

practically no delay:
May 23 10:07:17 [4601] Outgoing: remote lock-configuration failed on re1

What I am getting at here is that theoretically the actual commit is only given 'after' the commit check has completed on both route engines, but it would apparently take just a second or so longer for the DB lock on the second RE to be released. Following this reasoning only 1 'small' timeout could potentially suffice independent of configuration size, since it will not be related.

I just did a small test to wait for 1s instead of 10s before issuing 'commit' and it went right through no lock error!

new insight, commit check is completed on both route engines when commit is given (1 ok is given for both RE), apparently a short amount of time is needed for the lock to clear on the second RE.

vnitinv · 2016-07-19T11:13:21Z

@dgarros @stacywsmith are we ok with 1 sec sleep? If yes, then for this 1 sec sleep should we have 1 more params dual_re (as added by @jpvrenen )

dgarros · 2016-07-22T04:53:59Z

@vnitinv

I think we should follow @stacywsmith suggestions but instead of using "force" option, let's use the the delay option with let's say 3 tries. First 1s, second 2s, third 4s ... if still not, raise exception.

Part 1

I suggest that we wrap the "cu.commit(opts)" in a try/except block. If an error is raised and the error message contains "remote lock-configuration failed on" then set "opts['force_sync'] = True" and retry the "cu.commit(opts)". If the commit still fails with sync_force, then just raise the exception.

Part 2

Maybe we can do this in a loop that uses an exponential back off until the commit succeeds or some timeout occurs (but even then the ultimate timeout will be somewhat arbitrary).

jpvrenen · 2016-07-24T05:56:28Z

@vnitinv @dgarros
Using try/except will throw an warning/error every time a configuration is pushed, I cant recall (at this point) if a rollback is then performed so I am not sure if configuration would be retained between the delayed commit(s). Also I don't think it is a good thing performing an action when you already know it will cause a warning.

I have thought about some other solutions:
1 find out (dynamically) if the system contains more that 1 route engine and add a delay based on that, this will potentially add some overall time to the program.
2 add a 'fixed' delay between check and commit
3 add a time_between_check_and_commit parameter in seconds (between 0 - 4s), optional so the user can have some control over this.

Instead of adding a default value of 1s the 'check_commit_wait' parameter indicates a timeout is needed between check and commit and can be set between 1 and 4s.

Only check if parameter is used.

jpvrenen · 2016-08-17T15:20:00Z

Changed the 'dual_re' parameter into a 'check_commit_wait' parameter value can be between 1 - 4 s.

No default value

jpvrenen · 2016-08-23T08:53:54Z

@vnitinv @dgarros
'dual_re' parameter has been replaced by 'check_commit_wait' parameter, which gives an optional 1 - 4 s delay between the 'commit check' and 'commit' command.

stacywsmith · 2016-08-24T23:07:04Z

@vnitinv I believe I'm now OK with this pull request. If you agree, can we go ahead and get this merged?

stacywsmith · 2016-08-24T23:10:06Z

@jpvrenen It looks like you ran into this issue on Junos 14.2R4. Can you confirm that's the release you are using?

I've also been tracking this as a Junos bug. We believe we may have identified the root cause, but I'd like to confirm the issue we've found is truly the issue you're hitting.

jpvrenen · 2016-08-25T08:29:45Z

@stacywsmith Yes I confirm that we are using Junos 14.2R4.9

jNicker · 2016-09-01T21:01:43Z

library/junos_install_config

                    opts['confirm'] = args['confirm']

+                if args['check_commit_wait']:
+				   check_commit_wait = int(args['check_commit_wait'])


There are tabs here breaking the script

@jNicker I merged pull request #156 to fix this issue. Thanks for the submission!

Add sleep 10 before cu.commit

0eee56f

Added option dual_re

2af06c6

Set to yes in a Dual RE scenario when 'commit sync' is enabled in the config. (set system commit synchronize)

Shorter

9886bb7

Sleep 1 instead of 10

90884cb

new insight, commit check is completed on both route engines when commit is given (1 ok is given for both RE), apparently a short amount of time is needed for the lock to clear on the second RE.

jpvrenen added 2 commits August 17, 2016 17:13

Changed 'dual_re' to 'check_commit_wait'

7748ebd

Instead of adding a default value of 1s the 'check_commit_wait' parameter indicates a timeout is needed between check and commit and can be set between 1 and 4s.

Only check if parameter is used.

91121d9

Only check if parameter is used.

No default value

9bffa71

No default value

vnitinv merged commit b3f85f5 into Juniper:master Aug 25, 2016

jNicker reviewed Sep 1, 2016
View reviewed changes

Uh oh!

Add sleep 10 before cu.commit #140

Add sleep 10 before cu.commit #140

Uh oh!

Conversation

jpvrenen commented May 24, 2016

Uh oh!

vnitinv commented May 25, 2016

Uh oh!

dgarros commented May 26, 2016

Uh oh!

jpvrenen commented May 26, 2016

Uh oh!

dgarros commented Jun 6, 2016

Uh oh!

jpvrenen commented Jun 10, 2016

Uh oh!

stacywsmith commented Jun 10, 2016

Uh oh!

jpvrenen commented Jun 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stacywsmith commented Jun 13, 2016

Uh oh!

jpvrenen commented Jun 14, 2016

Uh oh!

vnitinv commented Jul 19, 2016

Uh oh!

dgarros commented Jul 22, 2016

Uh oh!

jpvrenen commented Jul 24, 2016

Uh oh!

jpvrenen commented Aug 17, 2016

Uh oh!

jpvrenen commented Aug 23, 2016

Uh oh!

stacywsmith commented Aug 24, 2016

Uh oh!

stacywsmith commented Aug 24, 2016

Uh oh!

jpvrenen commented Aug 25, 2016

Uh oh!

jNicker Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

stacywsmith Sep 8, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jpvrenen commented Jun 13, 2016 •

edited

Loading