Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: add rcu apis in frr grpc pthread #6286

Closed
wants to merge 1 commit into from

Conversation

chiragshah6
Copy link
Member

@chiragshah6 chiragshah6 commented Apr 23, 2020

frr grpc pthread need to integrate rcu apis.
rcu_read_lock()
rcu_thread_prepare()
rcu_thread_unprepare()
rcu_thread_start()
rcu_read_unlock()

PR 5451 introduce rcu lock infrastructure to zlog. When grpc northbound plugin starts a thread in daemon's context, daemon sees a crash in zlog's rcu relad lock.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7c8f535 in __GI_abort () at abort.c:79
#2  0x00007ffff7c8f40f in __assert_fail_base (fmt=0x7ffff7df1ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7ffff7f5f083 "rt", file=0x7ffff7f58db5 "lib/frrcu.c", line=305, function=<optimized out>) at assert.c:92
#3  0x00007ffff7c9d102 in __GI___assert_fail (assertion=assertion@entry=0x7ffff7f5f083 "rt", file=file@entry=0x7ffff7f58db5 "lib/frrcu.c", line=line@entry=305, function=function@entry=0x7ffff7f58f50 <__PRETTY_FUNCTION__.5792> "rcu_read_lo
ck") at assert.c:101
#4  0x00007ffff7ef340a in rcu_read_lock () at lib/frrcu.c:305
#5  0x00007ffff7f49de7 in vzlog_notls (prio=5, fmt=<optimized out>, ap=0x7ffff6990a88) at lib/zlog.c:365
#6  0x00007ffff784cb36 in zlog () from /usr/local/lib/frr/modules/grpc.so
#7  0x00007ffff784cdbd in grpc_pthread_start(void*) () from /usr/local/lib/frr/modules/grpc.so
#8  0x00007ffff7e35fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9  0x00007ffff7d664cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Signed-off-by: Chirag Shah chirag@cumulusnetworks.com

@NetDEF-CI

This comment has been minimized.

@LabN-CI

This comment has been minimized.

@NetDEF-CI

This comment has been minimized.

@mjstapp
Copy link
Contributor

mjstapp commented Apr 24, 2020

the description for this is a bit ... thin. I'm curious: just what data can this pthread possibly touch, in frr daemons? and so ... what would rcu help it do?

@qlyoung
Copy link
Member

qlyoung commented Apr 27, 2020

@mjstapp the GRPC thread spawns its own pthreads internally and is unsynchronized relative to the rest of FRR. As I understand it this patch is simply boilerplate to prevent a crash in zlog code which itself uses RCU primitives now.

Copy link
Contributor

@eqvinox eqvinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on the RCU bits, but the frr_pthread part looks a bit ... contorted? ... to me. @qlyoung ?

@eqvinox
Copy link
Contributor

eqvinox commented Apr 28, 2020

the description for this is a bit ... thin. I'm curious: just what data can this pthread possibly touch, in frr daemons? and so ... what would rcu help it do?

@mjstapp indeed as Quentin says the zlog code does an rcu_read_lock/unlock() pair internally, which will trip an assertion if the thread has no RCU state. That's the price of lock-free logging...

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a bit confused: if you used frr_pthread_start, wouldn't that work? that should run your start function, and it should also deal with the rcu pthread-specifics internally.

@rwestphal
Copy link
Member

rwestphal commented Apr 28, 2020

Still a bit confused: if you used frr_pthread_start, wouldn't that work? that should run your start function, and it should also deal with the rcu pthread-specifics internally.

Do you mean frr_pthread_run()? If yes, I wonder the same thing... Either that or not use frr_pthread_new() at all.

That said, the gRPC plugin is working again with this PR.

@qlyoung
Copy link
Member

qlyoung commented Apr 28, 2020

Alright, I understand you now @mjstapp. Went and refreshed my memory on the RCU facilities in the pthread wrappers.

Still a bit confused: if you used frr_pthread_start, wouldn't that work? that should run your start function, and it should also deal with the rcu pthread-specifics internally.

Yep, we should be doing this @chiragshah6

@mjstapp
Copy link
Contributor

mjstapp commented Apr 28, 2020

Do you mean frr_pthread_run()? If yes, I wonder the same thing... [...]

Yes, I was just thinking that it would be better just to use the frr_pthread facility, and not haul the internal details out into the northbound code? I think you can still drive a pthread into (horrible) grpc Wait() that way.

@LabN-CI

This comment has been minimized.

@NetDEF-CI

This comment has been minimized.

@chiragshah6 chiragshah6 requested review from mjstapp and eqvinox April 29, 2020 21:41
Copy link
Member

@rwestphal rwestphal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, but I'm getting this error when killing a daemon:

# ripd -M grpc
^Cripd: lib/frrcu.c:481: rcu_shutdown: Assertion `rcu_threads_count(&rcu_threads) == 1' failed.
Aborted (core dumped)

Then I found the comment below in the RCU code:

        /* rcu_shutdown can only be called singlethreaded, and it does a
         * pthread_join, so it should be impossible that anything ended up
         * on the queue after RCUA_END
         */

Which probably means we need to cancel the gRPC pthreads somehow if we want to get rid of that warning. I just don't know if the gRPC lib offers an API for that.

EDIT: this little change fixed the problem (not sure if it's correct though):

--- a/lib/frr_pthread.c
+++ b/lib/frr_pthread.c
@@ -116,6 +116,7 @@ static void frr_pthread_destroy_nolock(struct frr_pthread *fpt)
        pthread_mutex_destroy(&fpt->mtx);
        pthread_mutex_destroy(fpt->running_cond_mtx);
        pthread_cond_destroy(fpt->running_cond);
+       rcu_thread_unprepare(fpt->rcu_thread);
        XFREE(MTYPE_FRR_PTHREAD, fpt->name);
        XFREE(MTYPE_PTHREAD_PRIM, fpt->running_cond_mtx);
        XFREE(MTYPE_PTHREAD_PRIM, fpt->running_cond);

flog_err(EC_LIB_SYSTEM_CALL, "%s: error creating pthread: %s",
__func__, safe_strerror(errno));
return -1;
}
pthread_detach(grpc_pthread);
pthread_detach(fpt->thread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need pthread_detach() anymore since we're calling frr_pthread_destroy() on frr_grpc_finish().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frr_pthread_destroy just frees the allocated structure, it doesn't do anything with the internal pthread resources, which still technically need to be freed. But currently we don't clean those up anyway; the only time we kill this thread is when the program exits, so I don't think it makes much difference whether it's called or not.

lib/northbound_grpc.cpp Outdated Show resolved Hide resolved
start grpc thread with frr_pthread library
callbacks to integrate with rcu infrastructure.

If a thread is created using native pthread callbacks
and if zlog is used then it leads to crash.

Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
@LabN-CI
Copy link
Collaborator

LabN-CI commented Apr 30, 2020

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result 0
Date 0
Start 0
Finish vncregress-2020-04-29-22:41:17.txt
Run-Time autoscript-2020-04-29-22:42:14.log.bz2
Total 499 493 427
Pass Complete
Fail 04/01/2020
Valgrind-Errors 22:41:12
Valgrind-Loss 23:05:38
Details 24:26
Log 1815
Memory 1815
SUCCESS git merge/6286 ad231d6 0
04/29/2020 0
22:41:17 0
23:07:06 autoscript-2020-04-01-22:41:12.txt
25:49 autoscript-2020-04-01-22:41:12.log.bz2
1815 462 479 418
1815

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Apr 30, 2020

Continuous Integration Result: FAILED

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Failed

Topology tests on Ubuntu 18.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-12094/test

Topology Tests failed for Topology tests on Ubuntu 18.04 amd64:

2020-04-30 03:13:20,911 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/bgp.py", line 159, in create_router_bgp
    tgen, router, data_all_bgp, "bgp", build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: line 6: % Unknown command[27]: neighbor 10.0.0.13 remote-as 0 


2020-04-30 03:13:21,154 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/bgp.py", line 159, in create_router_bgp
    tgen, router, data_all_bgp, "bgp", build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % No BGP process is configured
line 2: Failure to communicate[13] to bgpd, line: no router bgp 



*** defaultIntf: warning: r1 has no interfaces
2020-04-30 03:40:23,673 ERROR: '_bgp_has_routes' failed after 37.76 seconds
2020-04-30 03:48:11,802 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 1509, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:-1 



2020-04-30 03:48:11,927 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 1509, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:65536 



2020-04-30 03:48:12,052 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 1509, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:4294967296 



2020-04-30 03:48:12,174 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 1509, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:-1:1 



2020-04-30 03:50:09,787 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 1509, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 240, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TOPOU1804/topotests/lib/common_config.py", line 397, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: line 2: % Command incomplete[4]: bgp large-community-list standard Test1 permit  



2020-04-30 03:57:03,234 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-04-30 03:57:05,290 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-04-30 03:57:11,453 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1


r2: zebra crashed. Core file found - Backtrace follows:
[New LWP 30165]
[New LWP 30166]
[New LWP 30167]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
[Current thread is 1 (Thread 0x7f1fa66017c0 (LWP 30165))]
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f1fa57ed801 in __GI_abort () at abort.c:79
#2  0x00007f1fa62455af in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#3  <signal handler called>
#4  0x00007f1fa58c1bf9 in __GI___poll (fds=0x5648ef099440, nfds=6, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#5  0x00007f1fa6253310 in thread_fetch () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#6  0x00007f1fa621bfb3 in frr_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#7  0x00005648edddbeae in main ()
2020-04-30 04:03:33,299 ERROR: assert failed at "test_ldp_vpls_topo1/test_memory_leak": 
r2: zebra crashed. Core file found - Backtrace follows:
[New LWP 30165]
[New LWP 30166]
[New LWP 30167]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
[Current thread is 1 (Thread 0x7f1fa66017c0 (LWP 30165))]
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f1fa57ed801 in __GI_abort () at abort.c:79
#2  0x00007f1fa62455af in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#3  <signal handler called>
#4  0x00007f1fa58c1bf9 in __GI___poll (fds=0x5648ef099440, nfds=6, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#5  0x00007f1fa6253310 in thread_fetch () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#6  0x00007f1fa621bfb3 in frr_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#7  0x00005648edddbeae in main ()

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/artifact/TOPOU1804/ErrorLog/log_topotests.txt

Successful on other platforms/tests
  • Ubuntu 12.04 deb pkg check
  • Static analyzer (clang)
  • Fedora 29 rpm pkg check
  • Ubuntu 20.04 deb pkg check
  • IPv6 protocols on Ubuntu 14.04
  • Ubuntu 18.04 deb pkg check
  • Addresssanitizer topotests part 1
  • Ubuntu 16.04 deb pkg check
  • Addresssanitizer topotests part 2
  • Topotest tests on Ubuntu 16.04 i386
  • CentOS 7 rpm pkg check
  • Debian 8 deb pkg check
  • Debian 10 deb pkg check
  • Addresssanitizer topotests part 3
  • Debian 9 deb pkg check
  • Topology tests on Ubuntu 16.04 amd64
  • IPv4 protocols on Ubuntu 14.04
  • IPv4 ldp protocol on Ubuntu 16.04
  • Ubuntu 14.04 deb pkg check

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200430-01-gad231d60e-0 (missing) -> 7.4-dev-20200430-01-gad231d60e-0~deb10u1

@eqvinox
Copy link
Contributor

eqvinox commented May 5, 2020

EDIT: this little change fixed the problem (not sure if it's correct though):

--- a/lib/frr_pthread.c
+++ b/lib/frr_pthread.c
@@ -116,6 +116,7 @@ static void frr_pthread_destroy_nolock(struct frr_pthread *fpt)
        pthread_mutex_destroy(&fpt->mtx);
        pthread_mutex_destroy(fpt->running_cond_mtx);
        pthread_cond_destroy(fpt->running_cond);
+       rcu_thread_unprepare(fpt->rcu_thread);
        XFREE(MTYPE_FRR_PTHREAD, fpt->name);
        XFREE(MTYPE_PTHREAD_PRIM, fpt->running_cond_mtx);
        XFREE(MTYPE_PTHREAD_PRIM, fpt->running_cond);

This breaks when the thread actually does exit since on exit the cleanup is already done, so now it's done twice...

@eqvinox eqvinox closed this May 5, 2020
@eqvinox eqvinox reopened this May 5, 2020
@eqvinox
Copy link
Contributor

eqvinox commented May 5, 2020

accidentally hit the close button, sorry

Copy link
Member

@qlyoung qlyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm fine with the changes here, since they at least avoid the crash on startup, they aren't sufficient to "fix" the plugin with respect to RCU, even inasmuch as restoring the previous "yolo mode" with no synchronization. This is because grpcpp actually spawns a new thread to handle each RPC, with no initialization callback or the like exposed to client code. Consequently anything using RCU APIs (ie any zlog) in any of the callback handlers will also cause a crash due to uninitialized RCU stuff.

The plugin will need to be converted to use grpcpp's async threading model.

@qlyoung
Copy link
Member

qlyoung commented May 8, 2020

Closing in favor of #6368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants