-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: add rcu apis in frr grpc pthread #6286
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
the description for this is a bit ... thin. I'm curious: just what data can this pthread possibly touch, in frr daemons? and so ... what would rcu help it do? |
@mjstapp the GRPC thread spawns its own pthreads internally and is unsynchronized relative to the rest of FRR. As I understand it this patch is simply boilerplate to prevent a crash in zlog code which itself uses RCU primitives now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK on the RCU bits, but the frr_pthread
part looks a bit ... contorted? ... to me. @qlyoung ?
@mjstapp indeed as Quentin says the zlog code does an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a bit confused: if you used frr_pthread_start
, wouldn't that work? that should run your start function, and it should also deal with the rcu pthread-specifics internally.
Do you mean That said, the gRPC plugin is working again with this PR. |
Alright, I understand you now @mjstapp. Went and refreshed my memory on the RCU facilities in the pthread wrappers.
Yep, we should be doing this @chiragshah6 |
Yes, I was just thinking that it would be better just to use the frr_pthread facility, and not haul the internal details out into the northbound code? I think you can still drive a pthread into (horrible) grpc Wait() that way. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, but I'm getting this error when killing a daemon:
# ripd -M grpc
^Cripd: lib/frrcu.c:481: rcu_shutdown: Assertion `rcu_threads_count(&rcu_threads) == 1' failed.
Aborted (core dumped)
Then I found the comment below in the RCU code:
/* rcu_shutdown can only be called singlethreaded, and it does a
* pthread_join, so it should be impossible that anything ended up
* on the queue after RCUA_END
*/
Which probably means we need to cancel the gRPC pthreads somehow if we want to get rid of that warning. I just don't know if the gRPC lib offers an API for that.
EDIT: this little change fixed the problem (not sure if it's correct though):
--- a/lib/frr_pthread.c
+++ b/lib/frr_pthread.c
@@ -116,6 +116,7 @@ static void frr_pthread_destroy_nolock(struct frr_pthread *fpt)
pthread_mutex_destroy(&fpt->mtx);
pthread_mutex_destroy(fpt->running_cond_mtx);
pthread_cond_destroy(fpt->running_cond);
+ rcu_thread_unprepare(fpt->rcu_thread);
XFREE(MTYPE_FRR_PTHREAD, fpt->name);
XFREE(MTYPE_PTHREAD_PRIM, fpt->running_cond_mtx);
XFREE(MTYPE_PTHREAD_PRIM, fpt->running_cond);
flog_err(EC_LIB_SYSTEM_CALL, "%s: error creating pthread: %s", | ||
__func__, safe_strerror(errno)); | ||
return -1; | ||
} | ||
pthread_detach(grpc_pthread); | ||
pthread_detach(fpt->thread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need pthread_detach()
anymore since we're calling frr_pthread_destroy()
on frr_grpc_finish()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frr_pthread_destroy
just frees the allocated structure, it doesn't do anything with the internal pthread resources, which still technically need to be freed. But currently we don't clean those up anyway; the only time we kill this thread is when the program exits, so I don't think it makes much difference whether it's called or not.
start grpc thread with frr_pthread library callbacks to integrate with rcu infrastructure. If a thread is created using native pthread callbacks and if zlog is used then it leads to crash. Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopology tests on Ubuntu 18.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-12094/test Topology Tests failed for Topology tests on Ubuntu 18.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/artifact/TOPOU1804/ErrorLog/log_topotests.txt Successful on other platforms/tests
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12094/ This is a comment from an automated CI system. Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
This breaks when the thread actually does exit since on exit the cleanup is already done, so now it's done twice... |
accidentally hit the close button, sorry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm fine with the changes here, since they at least avoid the crash on startup, they aren't sufficient to "fix" the plugin with respect to RCU, even inasmuch as restoring the previous "yolo mode" with no synchronization. This is because grpcpp actually spawns a new thread to handle each RPC, with no initialization callback or the like exposed to client code. Consequently anything using RCU APIs (ie any zlog
) in any of the callback handlers will also cause a crash due to uninitialized RCU stuff.
The plugin will need to be converted to use grpcpp's async threading model.
Closing in favor of #6368 |
frr grpc pthread need to integrate rcu apis.
rcu_read_lock()
rcu_thread_prepare()
rcu_thread_unprepare()
rcu_thread_start()
rcu_read_unlock()
PR 5451 introduce rcu lock infrastructure to
zlog
. When grpc northbound plugin starts a thread in daemon's context, daemon sees a crash inzlog's
rcu relad lock.Signed-off-by: Chirag Shah chirag@cumulusnetworks.com