Replies: 1 comment
-
|
Go ahead and submit a PR with this change. Everything looks reasonable. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Question: is
bgp_writes_on()needed in the error path ofbgp_connect_success()?I'm looking at the error path in
bgp_connect_success()whenbgp_getsockname()fails.The code calls
bgp_notify_send()followed bybgp_writes_on()before returningBGP_FSM_FAILURE:The same pattern exists in
bgp_connect_success_w_delayopen()at line 2347.My understanding (please correct me if I'm wrong)
As far as I can tell,
bgp_notify_send()writes the NOTIFICATION synchronously:bgp_notify_send_internal()cleans the output buffer(
stream_fifo_clean(connection->obuf)atbgp_packet.c:981), pushes theNOTIFICATION packet, and then calls
bgp_write_notify()which pops it fromobufand does a directwrite()to the socket (bgp_packet.c:748-759). Ifthat's correct, by the time
bgp_notify_send()returnsconnection->obufshould be empty, and the subsequent
bgp_writes_on()would have nothing toflush.
I also noticed that every other call site of
bgp_writes_on()in the codebaseseems to follow the pattern of first enqueueing a packet in
obufand thencalling
bgp_writes_on()to flush it. These two calls in the error pathsappear to be the only exceptions — unless I'm missing something.
Potential issue
If the
bgp_writes_on()call is indeed unnecessary, it could also beproblematic: it schedules
connection->t_writeon the I/O pthread(
bgp_pth_io), and the subsequentbgp_stop()(triggered by theBGP_FSM_FAILUREreturn) cancels it withevent_cancel_async(). Since thecancellation targets a different thread, it may not complete immediately. In
theory, if a
BGP_Startevent arrives in that window (e.g., from an NHTupdate),
bgp_start()could hitassert(!connection->t_write)atbgp_fsm.c:2513.We hit what looks like this scenario in our deployment, but I wanted to confirm
my reading of the code before proposing any change.
Context
These two
bgp_writes_on()calls seem to have been introduced in commit424ab01d0f("bgpd: implement buffered reads") as part of the I/O modelrefactoring. In the success path of
bgp_connect_success()the calls werereplaced by
bgp_reads_on()+bgp_open_send(), but in the error pathbgp_writes_on()remained. It looks like it might have been an oversight,but I'm not sure if there's a reason for it that I'm not seeing.
Would this removal be safe?
Any insight would be appreciated. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions