Skip to content

blas_exec exits when pthread_create fails #668

Closed
@jklowden

Description

@jklowden

blas_exec calls blas_thread_init, which unceremoniously (and a little mysteriously) exits if pthread_create fails. The below patch clarifies the message on standard error, and raises SIGINT to give the caller a chance to handle the condition (say, by requesting fewer threads). I also took the liberty of redefining blas_thread_server to return void * (as required by pthread_create), thereby eliminating some unnecessary and potentially dangerous casts.

Ideally, a library function encountering an OS error would return an error code instead of calling exit or raise. In this case, though the calls to blas_thread_init all ignore the return code. It wasn't clear to me that blas_exec has documented semantics, either. (I didn't find documentation for it.) And it currently returns no error, so it seems likely no caller checks for one. Raising a signal gives the caller who cares something to catch instead of letting the program crash.

diff --git a/driver/others/blas_server.c b/driver/others/blas_server.c
index 1fd848c..87f75cd 100644
--- a/driver/others/blas_server.c
+++ b/driver/others/blas_server.c
@@ -70,9 +70,11 @@ USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 /*********************************************************************/

 #include "common.h"
-#ifdef OS_LINUX
+#if defined(OS_LINUX) || defined(OS_NETBSD)
 #include <dlfcn.h>
+#include <signal.h>
 #include <sys/resource.h>
+#include <sys/time.h>
 #endif

 #ifndef likely
@@ -265,7 +267,7 @@ int get_node(void);

 static int increased_threads = 0;

-static int blas_thread_server(void *arg){
+static void* blas_thread_server(void *arg){

   /* Thread identifier */
   BLASLONG  cpu = (BLASLONG)arg;
@@ -458,7 +460,7 @@ static int blas_thread_server(void *arg){

   //pthread_exit(NULL);

-  return 0;
+  return NULL;
 }

 #ifdef MONITOR
@@ -565,14 +567,23 @@ int blas_thread_init(void){

 #ifdef NEED_STACKATTR
       ret=pthread_create(&blas_threads[i], &attr,
-                    (void *)&blas_thread_server, (void *)i);
+                    &blas_thread_server, (void *)i);
 #else
       ret=pthread_create(&blas_threads[i], NULL,
-                    (void *)&blas_thread_server, (void *)i);
+                    &blas_thread_server, (void *)i);
 #endif
       if(ret!=0){
-       fprintf(STDERR,"OpenBLAS: pthread_creat error in blas_thread_init function. Error code:%d\n",ret);
-       exit(1);
+       struct rlimit rlim;
+       const char *msg = strerror(ret);
+       fprintf(STDERR, "OpenBLAS blas_thread_init: pthread_create: %s\n", msg);
+       if(0 == getrlimit(RLIMIT_NPROC, &rlim)) {
+         fprintf(STDERR, "OpenBLAS blas_thread_init: RLIMIT_NPROC "
+                 "%ld current, %ld max\n", rlim.rlim_cur, rlim.rlim_max);
+       }
+       if(0 != raise(SIGINT)) {
+         fprintf(STDERR, "OpenBLAS blas_thread_init: calling exit(3)\n");
+         exit(EXIT_FAILURE);
+       }
       }
     }

@@ -832,10 +843,10 @@ void goto_set_num_threads(int num_threads) {

 #ifdef NEED_STACKATTR
       pthread_create(&blas_threads[i], &attr,
-                    (void *)&blas_thread_server, (void *)i);
+                    &blas_thread_server, (void *)i);
 #else
       pthread_create(&blas_threads[i], NULL,
-                    (void *)&blas_thread_server, (void *)i);
+                    &blas_thread_server, (void *)i);
 #endif
     }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions