Skip to content

Slow service start with Ruby 3 #84

Open
@daipom

Description

@daipom

Description

Service startup is noticeably slower with Ruby 3.
I'm talking about the time to reach the Running state. In other words, the time to finish StartServiceCtrlDispatcher.
With Ruby 2, it takes about 5 seconds, but with Ruby 3, it can take more than 20 seconds.
The application implementation is irrelevant.

Windows service needs to finish StartServiceCtrlDispatcher in 30 seconds by default.
With Ruby 3 or later, this problem can cause timeout errors.

EventID 7000

The testservice service failed to start due to the following error: 
The service did not respond to the start or control request in a timely fashion.

EventID 7009

A timeout was reached (30000 milliseconds) while waiting for the testservice service to connect.

This problem occurs when both of the following conditions are met.

  • Ruby 3 (Ruby 3.0.0 ~ 3.3.0)
  • Mulitple CPU
    • This problem does not occur with single CPU (such as EC2 t2.small type)

Gem Version

2.3.2

Windows Version

I confirm this problem in the following:

  • Windows 10 Home
  • Windows Server 2019 Base EC2 Instance (t2.medium, t2.large, c5.large)

Replication Case

Here is a graph of the number of seconds to start a sample service for each Ruby version:

2024-02-15_20h41_38

(I will explain 322 with patch at the end)

Measured using a minimum implementation service:

require "win32/daemon"

class TestService < Win32::Daemon
  def service_main
    while running?
      sleep 10
    end
  end
end

TestService.new.mainloop

Environment:

  • Windows 10 Home 64bit
  • 4 core CPU, 8 logical processors
    • Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99GHz
  • 16GB RAM

Steps:

  • Register the sample service above (C:\test\run.rb)
require "win32/service"

# Specify ruby bin path
ruby_path = ""

Win32::Service.create(
  service_name: "testservice",
  binary_path_name: "\"#{ruby_path}\" C:\\test\\run.rb",
  display_name: "testservice",
)
  • Measure the time with the following PowerShell script
while($true)
{
    $datetime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    echo "$datetime, Start launching the service"
    Start-Service testservice
    $datetime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    echo "$datetime, Finish launching the service"
    Start-Sleep -seconds 10
    Stop-Service testservice
    Start-Sleep -seconds 10
}
  • After measuring, delete the service.
require "win32/service"

Win32::Service.delete("testservice")

Additional Information

  • I confirm the same tendency in the AWS EC2 instance Windows Server 2019 Base t2.medium, t2.large, c5.large
  • Interestingly, this problem does not occur in t2.small
  • I think it occurs in environments with multiple CPUs
  • Maybe some implementation about threading has changed with Ruby 3 series, and this seems to be influenced by that

Possible causes and solutions

Maybe the problem is on Ruby's side, but I think it could be avoided by implementation on this side.

In the current implementation, it switches between multiple Ruby threads and non-Ruby threads.
This complex process may be unsuitable for Ruby 3.

https://github.com/chef/win32-service/blob/win32-service-2.3.2/lib/win32/daemon.rb#L257

CreateThread makes a non-Ruby thread, and it calls Ruby Proc, and it makes a Ruby thread, and ...
Also, StartServiceCtrlDispatcher creates a non-Ruby thread, and it makes a new Ruby thread again and executes Service_Main Ruby Proc...

Although this is still PoC, I found that using Ruby thread from the beginning, as shown below, was very fast.
(It is 322 with patch data in the above graph).

lib\win32\daemon.rb
@@ -254,26 +254,21 @@
         raise SystemCallError.new("CreateEvent", FFI.errno)
       end
 
-      hThread = CreateThread(nil, 0, ThreadProc, Service_Main, 0, nil)
-
-      if hThread == 0
-        raise SystemCallError.new("CreateThread", FFI.errno)
-      end
-
-      events = FFI::MemoryPointer.new(:pointer, 2)
-      events.put_pointer(0, FFI::Pointer.new(hThread))
-      events.put_pointer(FFI::Pointer.size, FFI::Pointer.new(@@hStartEvent))
-
-      while (index = WaitForMultipleObjects(2, events, 0, 1000)) == WAIT_TIMEOUT
-      end
-
-      if index == WAIT_FAILED
-        raise SystemCallError.new("WaitForMultipleObjects", FFI.errno)
-      end
-
-      # The thread exited, so the show is off.
-      if index == WAIT_OBJECT_0
-        raise "Service_Main thread exited abnormally"
+      hThread = Thread.new(Service_Main) do |service_proc|
+        ste = FFI::MemoryPointer.new(SERVICE_TABLE_ENTRY, 2)
+
+        s = SERVICE_TABLE_ENTRY.new(ste[0])
+        s[:lpServiceName] = FFI::MemoryPointer.from_string("")
+        s[:lpServiceProc] = service_proc
+
+        s = SERVICE_TABLE_ENTRY.new(ste[1])
+        s[:lpServiceName] = nil
+        s[:lpServiceProc] = nil
+
+        StartServiceCtrlDispatcher(ste)
+      end
+
+      while WaitForSingleObject(@@hStartEvent, 1000) == WAIT_TIMEOUT
       end
 
       thr = Thread.new do
@@ -322,6 +317,7 @@
       end
 
       thr.join
+      hThread.join
     end
 
     # Returns the state of the service (as an constant integer) which can be any

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions