-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: Resolve signal handling race in event loop #7045
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution to FRR!
Click for style suggestions
To apply these suggestions:
curl -s https://gist.githubusercontent.com/polychaeta/7d67bd2cf2dac8cbcf087ad518ec0966/raw/a7acc885ef18649c38e7b7f2d23bcbbb579d5572/cr_7045_1599080057.diff | git apply
diff --git a/lib/thread.c b/lib/thread.c
index c5afcbb24..029e2a8e3 100644
--- a/lib/thread.c
+++ b/lib/thread.c
@@ -1500,7 +1500,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
break;
}
-#if 0 /* TODO */
+#if 0 /* TODO */
if (m->handle_signals && (tw == NULL)) {
int i;
@@ -1512,7 +1512,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
sleep(1);
}
}
-#endif /* TODO */
+#endif /* TODO */
/*
* Copy pollfd array + # active pollfds in it. Not necessary to
@@ -1529,11 +1529,11 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
}
pthread_mutex_lock(&m->mtx);
-// if (m->handle_signals && num < 0) {
-// zlog_debug("Thread fetch sees num %d, eintr_p %s",
-// num, (eintr_p ? "TRUE" : "FALSE"));
-// zlog_tls_buffer_flush();
-// }
+ // if (m->handle_signals && num < 0) {
+ // zlog_debug("Thread fetch sees num %d,
+ //eintr_p %s", num, (eintr_p ? "TRUE" : "FALSE"));
+ // zlog_tls_buffer_flush();
+ // }
/* Handle any errors received in poll() */
if (num < 0) {
If you are a new contributor to FRR, please see our contributing guidelines.
After making changes, you do not need to create a new PR. You should perform an amend or interactive rebase followed by a force push.
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: FailedUbuntu 18.04 arm7 build: Failed (click for details)Make failed for Ubuntu 18.04 arm7 build:
Ubuntu 18.04 ppc64le build: Failed (click for details)Make failed for Ubuntu 18.04 ppc64le build:
Ubuntu 18.04 ppc64le build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/U1804PPC64LEBUILD/config.status/config.status Ubuntu 20.04 amd64 build: Failed (click for details)Ubuntu 20.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/U2004AMD64BUILD/config.status/config.statusMake failed for Ubuntu 20.04 amd64 build:
Ubuntu 18.04 arm8 build: Failed (click for details)Make failed for Ubuntu 18.04 arm8 build:
OpenBSD 6 amd64 build: Failed (click for details)Make failed for OpenBSD 6 amd64 build:
OpenBSD 6 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/CI011BUILD/config.status/config.status Ubuntu 18.04 amd64 build: Failed (click for details)Make failed for Ubuntu 18.04 amd64 build:
Ubuntu 18.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/U1804AMD64/config.status/config.status Successful on other platforms/tests
Warnings Generated during build:Checkout code: Successful with additional warningsUbuntu 18.04 arm7 build: Failed (click for details)Make failed for Ubuntu 18.04 arm7 build:
Ubuntu 18.04 ppc64le build: Failed (click for details)Make failed for Ubuntu 18.04 ppc64le build:
Ubuntu 18.04 ppc64le build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/U1804PPC64LEBUILD/config.status/config.status Ubuntu 20.04 amd64 build: Failed (click for details)Ubuntu 20.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/U2004AMD64BUILD/config.status/config.statusMake failed for Ubuntu 20.04 amd64 build:
Ubuntu 18.04 arm8 build: Failed (click for details)Make failed for Ubuntu 18.04 arm8 build:
OpenBSD 6 amd64 build: Failed (click for details)Make failed for OpenBSD 6 amd64 build:
OpenBSD 6 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/CI011BUILD/config.status/config.status Ubuntu 18.04 amd64 build: Failed (click for details)Make failed for Ubuntu 18.04 amd64 build:
Ubuntu 18.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13999/artifact/U1804AMD64/config.status/config.status
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution to FRR!
Click for style suggestions
To apply these suggestions:
curl -s https://gist.githubusercontent.com/polychaeta/5a04b168a26ec8b74f3f0ed5c67cb25d/raw/56dbebb0ff0654ea77fd984f064f05cbfe53962a/cr_7045_1599082125.diff | git apply
diff --git a/lib/thread.c b/lib/thread.c
index 01fba780f..dff34232d 100644
--- a/lib/thread.c
+++ b/lib/thread.c
@@ -1500,7 +1500,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
break;
}
-#if 0 /* TODO */
+#if 0 /* TODO */
if (m->handle_signals && (tw == NULL)) {
int i;
@@ -1512,7 +1512,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
sleep(1);
}
}
-#endif /* TODO */
+#endif /* TODO */
/*
* Copy pollfd array + # active pollfds in it. Not necessary to
@@ -1535,7 +1535,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
num, (eintr_p ? "TRUE" : "FALSE"));
zlog_tls_buffer_flush();
}
-#endif /* TODO */
+#endif /* TODO */
/* Handle any errors received in poll() */
if (num < 0) {
If you are a new contributor to FRR, please see our contributing guidelines.
After making changes, you do not need to create a new PR. You should perform an amend or interactive rebase followed by a force push.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution to FRR!
Click for style suggestions
To apply these suggestions:
curl -s https://gist.githubusercontent.com/polychaeta/36edc134cca1429ea3c13b5183d01d98/raw/1fb297682b506cdcff6d60ebb12c711e3bcaaedb/cr_7045_1599082218.diff | git apply
diff --git a/lib/thread.c b/lib/thread.c
index f50cbc858..f0947d069 100644
--- a/lib/thread.c
+++ b/lib/thread.c
@@ -1500,7 +1500,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
break;
}
-#if 0 /* TODO */
+#if 0 /* TODO */
if (m->handle_signals && (tw == NULL)) {
int i;
@@ -1512,7 +1512,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
sleep(1);
}
}
-#endif /* TODO */
+#endif /* TODO */
/*
* Copy pollfd array + # active pollfds in it. Not necessary to
@@ -1535,7 +1535,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
num, (eintr_p ? "TRUE" : "FALSE"));
zlog_tls_buffer_flush();
}
-#endif /* TODO */
+#endif /* TODO */
/* Handle any errors received in poll() */
if (num < 0) {
If you are a new contributor to FRR, please see our contributing guidelines.
After making changes, you do not need to create a new PR. You should perform an amend or interactive rebase followed by a force push.
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: FailedOpenBSD 6 amd64 build: Failed (click for details)Make failed for OpenBSD 6 amd64 build:
OpenBSD 6 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14002/artifact/CI011BUILD/config.status/config.status Successful on other platforms/tests
Warnings Generated during build:Checkout code: Successful with additional warningsOpenBSD 6 amd64 build: Failed (click for details)Make failed for OpenBSD 6 amd64 build:
OpenBSD 6 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14002/artifact/CI011BUILD/config.status/config.status
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14003/ This is a comment from an automated CI system. Warnings Generated during build:Checkout code: Successful with additional warnings
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
CLANG Static Analyzer Summary
Fixed warnings:
New warnings:
Static Analysis warning summary compared to base:
2 Static Analyzer issues remaining.See details at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution to FRR!
Click for style suggestions
To apply these suggestions:
curl -s https://gist.githubusercontent.com/polychaeta/3a7b81d4dee6b008c7f62f298d665fc8/raw/6a96393efc52d0fb23da96c2c1ca038d9b93dc33/cr_7045_1599139344.diff | git apply
diff --git a/lib/thread.c b/lib/thread.c
index 3f5821ea6..3e1a894d8 100644
--- a/lib/thread.c
+++ b/lib/thread.c
@@ -1500,7 +1500,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
break;
}
-#if 0 /* TODO */
+#if 0 /* TODO */
if (m->handle_signals && (tw == NULL)) {
int i;
@@ -1512,7 +1512,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
sleep(1);
}
}
-#endif /* TODO */
+#endif /* TODO */
/*
* Copy pollfd array + # active pollfds in it. Not necessary to
@@ -1535,7 +1535,7 @@ struct thread *thread_fetch(struct thread_master *m, struct thread *fetch)
num, (eintr_p ? "TRUE" : "FALSE"));
zlog_tls_buffer_flush();
}
-#endif /* TODO */
+#endif /* TODO */
/* Handle any errors received in poll() */
if (num < 0) {
If you are a new contributor to FRR, please see our contributing guidelines.
After making changes, you do not need to create a new PR. You should perform an amend or interactive rebase followed by a force push.
Rebased to newer master, clean up a style warning |
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14011/ This is a comment from an automated CI system. Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base1 Static Analyzer issues remaining.See details at |
8fdf6ec
to
8a7551d
Compare
Rebase and clean up code for review. |
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14287/ This is a comment from an automated CI system. Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopo tests part 0 on Ubuntu 16.04 i386: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-14560/test Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 i386:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14560/artifact/TOPOI386/ErrorLog/log_topotests.txt Successful on other platforms/tests
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
CI:rerun |
lib/thread.c
Outdated
|
||
found = 0; | ||
buf[0] = '\0'; | ||
for (i = 0; i < 32; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be SIGRTMAX instead of 32? and I start at 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I considered the real-time world ... out of scope for now? what I was wanting to see, and what seemed most useful for frr processes, were the handful of signals that the daemons try to handle with callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright why stop at 32? What was that decision point? What does 32 mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
31 is the last non-realtime signal - so ... that seemed like the right place to stop. are you unhappy with the literal value there - if that were < SIGRTMIN
, would that be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really could care less what define we use for 32, but I do know that just blindly using 32 does not help with code readability. Why did you choose 32? There has to be a define provided by the signal system to indicate what you are trying to do here and why we are limiting the signal processing to the first 32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried using SIGRTMIN - let's see how that looks
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14575/ This is a comment from an automated CI system. Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
Add an api that blocks application-handled signals (SIGINT, SIGTERM, e.g.) then tests whether any signals have been received. This helps to manage a race between signal reception and the poll call in the main event loop. Signed-off-by: Mark Stapp <mjs@voltanet.io>
Updated to current master |
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopo tests part 0 on Ubuntu 16.04 i386: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-14990/test Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 i386:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14990/artifact/TOPOI386/ErrorLog/log_topotests.txt Successful on other platforms/tests
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
CI:rerun |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopo tests part 0 on Ubuntu 16.04 i386: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-14997/test Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 i386:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-14997/artifact/TOPOI386/ErrorLog/log_topotests.txt Successful on other platforms/tests
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
CI:rerun |
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-15001/ This is a comment from an automated CI system. Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
Pushed a version of the debug api that uses SIGRTMIN, and adds a comment about only looking at non-realtime signals. |
Outdated results 💚Basic BGPD CI results: SUCCESS, 0 tests failed
For details, please contact louberger |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: FailedOpenBSD 6 amd64 build: Failed (click for details)Make failed for OpenBSD 6 amd64 build:
OpenBSD 6 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-15027/artifact/CI011BUILD/config.status/config.status Successful on other platforms/tests
Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
Add an api that debugs the signals in a sigset. Signed-off-by: Mark Stapp <mjs@voltanet.io>
Manage the main pthread's signal mask to avoid a signal-handling race. Before entering poll, check for pending signals that the application needs to handle. Use ppoll() to re-enable those signals during the poll call. Signed-off-by: Mark Stapp <mjs@voltanet.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution to FRR!
Click for style suggestions
To apply these suggestions:
curl -s https://gist.githubusercontent.com/polychaeta/e27eec97fb83d08ee848ac438c8e70bb/raw/3de22abafaa2400645ec89fc65baece87662d463/cr_7045_1603917138.diff | git apply
diff --git a/lib/thread.c b/lib/thread.c
index 8da5aa61d..c78c12afa 100644
--- a/lib/thread.c
+++ b/lib/thread.c
@@ -1764,9 +1764,9 @@ void debug_signals(const sigset_t *sigs)
* need to pick a reasonable value.
*/
#if defined SIGRTMIN
-# define LAST_SIGNAL SIGRTMIN
+#define LAST_SIGNAL SIGRTMIN
#else
-# define LAST_SIGNAL 32
+#define LAST_SIGNAL 32
#endif
If you are a new contributor to FRR, please see our contributing guidelines.
After making changes, you do not need to create a new PR. You should perform an amend or interactive rebase followed by a force push.
oof - the most useful value that isn't 32 isn't in openbsd, trying again... |
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-15028/ This is a comment from an automated CI system. Warnings Generated during build:Debian 10 amd64 build: Successful with additional warningsDebian Package lintian failed for Debian 10 amd64 build:
|
First pass at resolving the signal-handling race we've had trouble with. Before entering poll, we block the application-handled signals (usually SIGTERM, SIGINT, SIGHUP, etc) then check whether any were received. We use ppoll() to restore the normal signal mask. There's still some debug/test code in the commits, and I'll clean that up, but I want to see whether this builds and runs...