-
Couldn't load subscription status.
- Fork 167
Add sleep 10 before cu.commit #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@stacywsmith @dgarros Can you please review this pull request. |
|
@jpvrenen I understand we are facing an issue with Dual RE system but it looks like this patch will affect all type of junos devices and increase the commit time by 10s, |
Set to yes in a Dual RE scenario when 'commit sync' is enabled in the config. (set system commit synchronize)
|
@dgarros Yes from my point of view this seem to happen in a DualRE scenario, when a 'valid' lock can exist that needs time to 'end' before 'commit'. but only when 'commit sync' is enabled in the configuration, pushing to single RE systems don't have this issue. I have added a 'dual_re' option/parameter. |
|
Hi @jpvrenen, Sorry it's taking so long but to be honest I'm not sure if this is the right approach to solve the problem. I was thinking maybe we should just have an option to add a delay before doing the commit wait_before_commit:
description:
- Indicate how many seconds the module should wait before commiting the changes
required: false
default: 0Any other ideas are welcome @vnitinv @stacywsmith any thoughts ? |
|
Hi @dgarros , Ideally it would be best to 'know' if some (valid) lock condition still exists because of the 'commit check' command before running the actual 'commit'. |
|
I've been looking into this. I feel it's a Junos bug, and I'll continue to look into it further and file a Junos PR. That said, this problem exists on deployed code and it would be best to have the Ansible module deal with it more gracefully. I suggest that we wrap the "cu.commit(**opts)" in a try/except block. If an error is raised and the error message contains "remote lock-configuration failed on" then set "opts['force_sync'] = True" and retry the "cu.commit(**opts)". If the commit still fails with sync_force, then just raise the exception. Any thoughts on this approach? If others agree with the general approach, I'll get this coded and submit a pull request. |
|
@dgarros @vnitinv @stacywsmith Hi, Using try/except sounds like a good idea, however instead of using the force option why not use the timeout option instead? Basically you will be 'forcing' a configuration that has a lock condition which exitsts becaue of the 'commit check' issued in the first place. This condition will resolve automatically after the commit check has completed normally on the second RE. Issueing the 'force' option will also cause any lock conditions (that don't resolve automatically) to be ignored. Please see below from the Juniper documentation about the 'force' option. http://www.juniper.net/documentation/en_US/junos15.1/topics/concept/junos-software-configuration-routing-engines-synchronization-overview.html && " Note: We recommend that you use the force option only if you are unable to resolve the issues that caused the commit synchronize command to fail." |
|
It appears the time it takes this lock condition to resolve is related to the size of the configuration being checked/committed. I don't think a single static value can cover a very large configuration without sleeping for some large amount of time that is wasteful in most other conditions. Maybe we can do this in a loop that uses an exponential back off until the commit succeeds or some timeout occurs (but even then the ultimate timeout will be somewhat arbitrary). |
|
@stacywsmith May 23 10:07:16 [4601] Outgoing: There appears to be only 1 'ok' after both RE config checks are already completed. Then right after the 'ok' the 'commit' is given: practically no delay: What I am getting at here is that theoretically the actual commit is only given 'after' the commit check has completed on both route engines, but it would apparently take just a second or so longer for the DB lock on the second RE to be released. Following this reasoning only 1 'small' timeout could potentially suffice independent of configuration size, since it will not be related. I just did a small test to wait for 1s instead of 10s before issuing 'commit' and it went right through no lock error! |
new insight, commit check is completed on both route engines when commit is given (1 ok is given for both RE), apparently a short amount of time is needed for the lock to clear on the second RE.
|
@dgarros @stacywsmith are we ok with 1 sec sleep? If yes, then for this 1 sec sleep should we have 1 more params dual_re (as added by @jpvrenen ) |
|
I think we should follow @stacywsmith suggestions but instead of using "force" option, let's use the the delay option with let's say 3 tries. First 1s, second 2s, third 4s ... if still not, raise exception. Part 1 Part 2 |
|
@vnitinv @dgarros I have thought about some other solutions: |
Instead of adding a default value of 1s the 'check_commit_wait' parameter indicates a timeout is needed between check and commit and can be set between 1 and 4s.
Only check if parameter is used.
|
Changed the 'dual_re' parameter into a 'check_commit_wait' parameter value can be between 1 - 4 s. |
No default value
|
@vnitinv I believe I'm now OK with this pull request. If you agree, can we go ahead and get this merged? |
|
@jpvrenen It looks like you ran into this issue on Junos 14.2R4. Can you confirm that's the release you are using? I've also been tracking this as a Junos bug. We believe we may have identified the root cause, but I'd like to confirm the issue we've found is truly the issue you're hitting. |
|
@stacywsmith Yes I confirm that we are using Junos 14.2R4.9 |
| opts['confirm'] = args['confirm'] | ||
|
|
||
| if args['check_commit_wait']: | ||
| check_commit_wait = int(args['check_commit_wait']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tabs here breaking the script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.