Description
Description
Scenario:
BGP receives a prefix, say P1, resolves via a nexthop (recursively)
This nexthop, lets say NHR is configured directly into ZEBRA in our code.
In a scaled scenario, following sequence observed
- NHR route present (resolved/valid)
- NHR is deleted (withdrawn from ZEBRA)
- NHR is advertised to ZEBRA
- P1 is advertised.
Result: P1 resolved/valid in BGP, but inactive in ZEBRA.
ANALYSIS:
Behaviour for each step:
in step 1: after early processing, route is added to "META QUEUE" and waiting
in step 2: after early processing, try to add route to "META QUEUE" but fails as "rn" already present.
while NHT still awaits main processing, step 3 is kicked in
in step 3: P1 is received, BGP processes it, marks it valid (As NHR route was already available). BGP sends update to ZEBRA.
ZEBRA receives it and processes, and now it determines that NHR is inactive (REMOVED flag) and marks it inactive.
After sometime, META QUEUE process the input of NHR and finally marks it active; however this information it not passed back to clients (BGP) as there is no change in nexthop params. It however sends an ROUTE_UPDATE to kernel.
To recover from this state, need to withdraw and re-advertise NHR.
we are trying to fix it from our end, let me know if this was already known/fixed (though I have searched for existing issues, but couldn't find an exact match)
Version
stable/9.0
How to reproduce
mentioned in description
Expected behavior
route to be actuve in ZEBRA as well.
Actual behavior
route inactive in ZEBRA
Additional context
No response
Checklist
- I have searched the open issues for this bug.
- I have not included sensitive information in this report.