Skip to content

Race in worker_loop #994

@talex5

Description

@talex5

worker_loop starts by masking signals, since if a signal occurs in this thread it will crash the process:

static void *worker_loop(void *data) {
lwt_unix_job job = (lwt_unix_job)data;
#if defined(HAVE_PTHREAD)
/* Block all signals, otherwise ocaml handlers defined with the
module Sys may be executed in this thread, oops... */
sigset_t mask;
sigfillset(&mask);
pthread_sigmask(SIG_SETMASK, &mask, NULL);
#endif

However, there is a slight chance that a signal will occur before this. Here's a test case:

let pid = Unix.getpid ()

let some_worker_job () =
  Lwt_unix.gethostbyname "www.google.com"

let _ =
  Sys.(set_signal sigusr1) @@ Signal_handle (fun _ -> print_endline "Got USR1");
  Lwt_main.run begin
    let job = some_worker_job () in
    print_endline "Mask USR1 in main thread to make sure worker gets it";
    ignore (Thread.(sigmask SIG_BLOCK) [Sys.sigusr1] : _ list);
    print_endline "Sending USR1";
    assert (Unix.system ("kill -USR1 " ^ string_of_int pid) = Unix.WEXITED 0);
    job
  end

If I add a small delay then it reliably segfaults:

static void *worker_loop(void *data) {
  lwt_unix_job job = (lwt_unix_job)data;

  for (int64_t i = 1; i < 10000000; i++) {
    __asm__ __volatile__("");
  }

  ...
Starting new thread                    
Mask USR1 in main thread to make sure worker gets it
Sending USR1
fish: Job 1, './_build/default/test/unix/sign…' terminated by signal SIGSEGV (Address boundary error)

I spotted this while trying to track down some segfaults in my program. I'm not sure if this is the cause of those, as the race looks hard to hit without the delay.

Using pthread_attr_setsigmask_np to set an initial mask might fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions