Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pimd: fix crash by wheel timer when del a vif #11959

Closed
wants to merge 2 commits into from

Conversation

lyq140
Copy link
Contributor

@lyq140 lyq140 commented Sep 17, 2022

pimd: fix crash by wheel timer when del a vif

reproduce (4 steps):

  1. topology: R1 -- Host
    one router with a interface ens38, connected to one host (just for make it simple).

  2. enable pim sm on ens38; config ens38 as RP; send a traffic packet from host to router.
    so there will be an (S, G) entry. state as follows:

ron-virtual-machine#
ron-virtual-machine# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - SGRpt Pruned, F - Register flag, T - SPT-bit set
Source Group Flags Proto Input Output TTL Uptime
192.168.7.100 239.255.255.250 SFTP none ens38 none 0 --:--:--

ron-virtual-machine# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
ens38 192.168.7.100 239.255.255.250 NotJ,RegP 00:21:49 --:--:-- 00:00:09 00:02:45 1

ron-virtual-machine#
ron-virtual-machine# show ip pim interface
Interface State Address PIM Nbrs PIM DR FHR IfChannels
ens38 up 192.168.7.101 0 local 0 0
pimreg up 0.0.0.0 0 local 0 0

ron-virtual-machine# show ip pim rp-info
RP address group/prefix-list OIF I am RP Source Group-Type
192.168.7.101 224.0.0.0/4 ens38 yes Static ASM

  1. del the vif of ens38 with 'no ip pim sm'. state as follows:

ron-virtual-machine(config)# interface ens38
ron-virtual-machine(config-if)# no ip pim sm
ron-virtual-machine(config-if)# end
ron-virtual-machine# show ip pim interface
Interface State Address PIM Nbrs PIM DR FHR IfChannels
pimreg up 0.0.0.0 0 local 0 0

ron-virtual-machine# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - SGRpt Pruned, F - Register flag, T - SPT-bit set
Source Group Flags Proto Input Output TTL Uptime
192.168.7.100 239.255.255.250 SFTP none <iif?> none 0 --:--:--

ron-virtual-machine# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
ens38 192.168.7.100 239.255.255.250 NotJ,RegP 00:23:03 --:--:-- 00:00:14 00:02:32 1

  1. waiting for wheel timer coming, then crashed

Program received signal SIGSEGV, Segmentation fault.
0x0000558852140ac4 in pim_upstream_kat_start_ok (up=0x55885286d560) at pimd/pim_oil.h:130
130 return &c_oil->oil.mfcc_parent;
(gdb) info frame
Stack level 0, frame at 0x7ffeb4b5e130:
rip = 0x558852140ac4 in pim_upstream_kat_start_ok (pimd/pim_oil.h:130); saved rip = 0x7f50c960ad8f
inlined into frame 1
source language c.
Arglist at unknown address.
Locals at unknown address, Previous frame's sp in rsp
(gdb) backtrace
#0 0x0000558852140ac4 in pim_upstream_kat_start_ok (up=0x55885286d560) at pimd/pim_oil.h:130
#1 pim_upstream_sg_running_proc (up=0x55885286d560) at pimd/pim_upstream.c:2029
#2 0x00007f50c960ad8f in wheel_timer_thread_helper (t=) at lib/wheel.c:55
#3 0x00007f50c9601431 in thread_call (thread=0x558852733690) at lib/thread.c:2008
#4 0x00007f50c9601709 in _thread_execute (xref=0x7f50c96b5e80 <_xref.11032>, m=0x558852601c80, func=, arg=0x55885272f060, val=0)
at lib/thread.c:2100
#5 0x00007f50c9601431 in thread_call (thread=thread@entry=0x7ffeb4b5e300) at lib/thread.c:2008
#6 0x00007f50c95baab8 in frr_run (master=0x558852601c80) at lib/libfrr.c:1198
#7 0x00005588521116c3 in main (argc=8, argv=, envp=) at pimd/pim_main.c:176

@lyq140 lyq140 changed the title Patch pim doc pimd: fix crash by wheel timer when del a vif Sep 17, 2022
@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Sep 17, 2022

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-7464/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for pim_neighbor.c | 2 issues
===============================================
< WARNING: Comparisons should place the constant on the right side of the test
< #660: FILE: /tmp/f1-4992/pim_neighbor.c:660:

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me your reasoning why you want RP instead of RPF?

@@ -705,14 +705,14 @@ void pim_upstream_switch(struct pim_instance *pim, struct pim_upstream *up,

if (pim_addr_is_any(up->upstream_addr)) {
if (PIM_DEBUG_PIM_EVENTS)
zlog_debug("%s: RPF not configured for %s", __func__,
zlog_debug("%s: RP not configured for %s", __func__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe RPF is correct. The up pointer can be a S,G or a *,G. In which case it's always the RPF that we want to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upstream_addr is the RPF'd nexthop to either the S,G or *,G therefore RPF is correct here.

Copy link
Contributor Author

@lyq140 lyq140 Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I suppose this is a typo.
upstream_addr is the address of S or RP, not nexthop. So it should be "RP not configured",
and there are other similar logs, like in function 'pim_rpf_update'.

@donaldsharp
Copy link
Member

let's get warnings cleaned up as well.

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Sep 24, 2022

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-7553/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

1. up->upstream_addr is either the Source or the RP,
   so if it is zero, must be with RP not configured.
2. up->rpf is the unicast-routing path to up->upstream_addr,
   so if it is null, must be unicast-routing not reachable.
3. cc.lastused is AND logic.
   Note: the command 'ip pim keep-alive-timer', is with 31
   as the min value before, like version stable8.0, so it
   appears a assistant judgement here('greater than 30').

Signed-off-by: ron <lyq140hf2006@163.com>
when delete a vif, like input command 'no ip pim sm',
we need a scan. Currently we dealed all general neighbors,
but missed the interface directly connected to the source.
It caused a crash when wheel timer comes.
The source only send ipmc traffic, but no pim-hello packet.
It can be considered as a special neighbor,
so add dealing this situation.

Signed-off-by: ron <lyq140hf2006@163.com>
up->sg_str);
return;
}

if (!up->rpf.source_nexthop.interface) {
if (PIM_DEBUG_PIM_EVENTS)
zlog_debug("%s: RP not reachable for %s", __func__,
zlog_debug("%s: RPF not reachable for %s", __func__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again if the interface is NULL then the RPF failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, RPF is about the nexthop,
and there are other similar logs, like in function 'pim_upstream_send_join'.

@lyq140
Copy link
Contributor Author

lyq140 commented Sep 27, 2022

ci:rerun

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-7589/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

@github-actions
Copy link

This PR is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this PR closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants