Skip to content

Return WIN32_NO_SOCKETS for miniperl.exe #22679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from
Open

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 18, 2024

8a548d1

Removed the optimization for miniperl.exe, build speed is important for new code. Bring back the macro for miniperl.exe only. Measureable time savings for me. see commit message.


??????????? maybe, its a small build perf opt

  • This set of changes requires a perldelta entry, and it is included.
  • This set of changes requires a perldelta entry, and I need help writing it.
  • This set of changes does not require a perldelta entry.

ws2_32.dll loads msvcrt.dll and nsi.dll into the process unconditionally,
we dont need 2 CRTs in 1 process. proc start up time is important for
core building. About release perl. A future patch will come, but atleast
speed up the build is good.

This patch series made a benchmarkable drop in startup time. see numbers
below.

Macro was tested with full perl541.all and ws2_32.dll was totallly
removed/not linked in. My older code from some years ago, still loaded
ws2_32.dll on full perl, even tho sockets were not usable from PP. Now
that is fixed, but I doubt no sockets production perl is popular. But
optimization/feature now works on full perl too but very few use cases
for it for full perl.

Before

C:\sources\perl5>timeit miniperl -e "for(0..200){`miniperl -e\"1\"`}"
Exit code      : 0
Elapsed time   : 3.33
Kernel time    : 0.37 (11.3%)
User time      : 0.11 (3.3%)
page fault #   : 9905
Working set    : 6520 KB
Paged pool     : 105 KB
Non-paged pool : 8 KB
Page file size : 6680 KB

After

C:\sources\perl5>timeit miniperl -e "for(0..200){`miniperl -e\"1\"`}"
Exit code      : 0
Elapsed time   : 3.11
Kernel time    : 0.23 (7.5%)
User time      : 0.05 (1.5%)
page fault #   : 9852
Working set    : 6308 KB
Paged pool     : 104 KB
Non-paged pool : 7 KB
Page file size : 6616 KB
@tonycoz
Copy link
Contributor

tonycoz commented Nov 7, 2024

Assuming 200 miniperl invocations in a build, it saves 0.22 seconds while making the code more complex.

The delayed sockets initialization is/was thread unsafe from what I can see.

I don't think it's worth the added complexity.

@bulk88
Copy link
Contributor Author

bulk88 commented Apr 5, 2025

Assuming 200 miniperl invocations in a build, it saves 0.22 seconds while making the code more complex.

miniperl doesnt need access to winsock at anypoint, it never goes on the WWW and it has no use outside of blead/CC libperl time. at the original time of the 1st revision of this PR/patch/branch, this was patch 1 of 2, or 1 of 3, to bring back all of delay winsock features. But I plan a different way of bringing back the "no winsock in miniperl, delay winsock in full perl" feature than the way in revision 1.

The delayed sockets initialization is/was thread unsafe from what I can see.

C function WSAStartup inside ws2_32.dll, was made by MS to be 100% thread race/reentry proof. There is an InterlockedCompareExchange() on a 0 or 1 global var, followed by a InitCritSec(), then an EnterCritSec(). Calling it twice or 2 threads colliding on 2 cores, was recongized by MS from day 1.

The case of 2 random 3rd party different authors DLLs, loaded into one random Win32 process, and both 3P random DLLs executing WSAStartup multiple times b/c they are unaware of each other, probably happens constantly in normal production code/normal win32 GUI or TUI apps all the time. MS knows this and did protect against this inside ws2_32.dll.

If 2 perl ithreads both execute WSAStartup, nothing bad happens. Nobody ever reported a bug when the winsock delay feature did work for many years.

I don't think it's worth the added complexity.

I very strongly disagree. ws2_32.dll ALWAYS loads msvcrt.dll into perl.exe's virtual address space, but perl/perl XS ecosystem uses ucrtbase.dll nowadays. and ws32_32.dll's runtime overhead minimum malloc memory, and its couple 100/maybe 1000 upper end, count of Ring 0 kernel calls, to do DeviceIOCtl() calls to its parent kernel driver afd.sys and ring 0 calls to enumerate a ton of data out of the windows registry, is totally unnecessary for most perl processes.

Also rpcrt4.dll and nsi.dll static linked, and something "ip helper .dll" forgot its real name but ip helper is required to enumerate NICs from user space and find a wired or wifi NDIS NIC to actually open as an object for sockets to work, all 3 absolutely dont need to sit inside perl.exes address space 24/7 and be loaded relocated, and run their DllMains and suck in a bunch of external state they need to operate from their DllMains.

These extra DLLs also increase VS IDE's debugger process start/attach/SEGV debugger attach time by many seconds in the UI for me, because what was 3-4 DLLs 5 years ago, is now 18 EIGHTEEN DLLs inside perl -e"sleep 200;". all these extra DLLs also make various C developer debug and diag tools more difficult to use, because of more stuff or more noise, in the final output of every C .pdb level/asm level, automatic report/log/hook trace log/snapshot tool, that someone wants to use to accomplish a task.

Something I used to be able to do, which I can't do anymore in blead perl, is set a BP on NtAllocateVirtualMemory. The peak temporary or momentary malloc/heapalloc usage of perl.exe on startup is so high now, that NtAllocateVirtualMemory rarely if ever executes again in the rest of the lifetime of the perl from, from inside the main runtime runloop (Perl_runops_standard()) because that HeapAlloc() has enough user-mode R/W free marked 4KB pages to last for the rest of the process, it doesn't need to go back to the kernel and get another unit of 4KB or a unit of 65KB.

winsock loading msvcrt.dll into address space and creating more HeapCreate() objects also makes the C dev user experience more noisy and difficult.

delay loading of winsock was the best core self make test and best perl Makefile.pl and EUMM CPAN toolchain's gmake test speed improvement ever for WinPerl I ever implemented or reimplemented in the past. perl.exe's process startup time, is super important for all devs who work with perl, because short lifespan perl processes are used everywhere in the perl ecosystem and these short lifespan perl processes always have a real time human developer watching their UI/STDOUT.

The startup time of nginix/lightspeed doesnt affect human developer time. .t running and EUMM running does affect human developer time. 5 seconds to 4 seconds, time 20-100 times a day adds upto minutes, then consider those couple minutes a day, times all perl devs/users on earth.

I plan to reimplement the delay winsock loading feature in a totally different way than the way above/the previous way, without 20-45 StartSocket() tokens all over the win32-only code base. I do recognize WSAStartup() needs to execute before the first socket FD/object is created. Its irrelevant what winsock actually does on startup and when exactly it runs its 1x startup logic, but that 1x startup logic has huge overhead if a process will never create a socket or touch the winsock dll again for the rest of th proc lifespan.

Technically MS has a choice between DllMain or WSAStartup for 1x startup logic but its irrelavent what is done where, since nobody can recompile or publish a MS made system DLL. But theoretical things winsock must do 1x on startup, are at minimum enumerate the PCIe NICs and ethernet frame protocol handlers from the registry, register the current PID with afd.sys, and set up its own private TlsGetValue TlsSetValue slot, and go digging through address space if it can find a user32.dll in address space, and create its invisible GUI Window object/mess around with the Win32 GUI message packet event loop system.

I dont think winsock does this in real life, but perhaps it also needs to talk to csrss.exe as part of its 1x startup code, using that rpcrt4.dll it loads, but again, the details don't matter since its a MS compiled DLL.

Not loading the winsock DLL unless the process is going to communicate on WWW fixes all problems instantly.

@bulk88
Copy link
Contributor Author

bulk88 commented Jun 6, 2025

Why its a bad decision for WinPerl to unconditionally have Winsock library always loaded and running in a WinPerl process. Winsock static runtime links to the "secret" msvcrt.dll CRT. One side effect is, msvcrt.dll registers a bunch of its own C89/C99/C++ global object destructor methods with ntdll.dll. Regardless how the user mode process tries to exit itself, either by ExitProcess() or UCRT exit(), those msvcrt.dll global object destructor methods WILL be fired by the WinNT kernel in Ring 3 - 0.01 or Ring 3 - 1.99, depending on your personal opinion of what MS's ntdll.dll is and does in the SW stack.

I have no easy way non-ASM way to benchmark the wall clock time cost of these destructors. Its probably 500 us to 1.5 ms max, maybe 5-10 ms on ancient HW. But there has to be a cost to that 1 millisecond, such as cpan.pl or make test sleeping on a pipe 2-3 millisecond longer during waitpid() per child Perl process.

>	ntdll.dll!RtlEnterCriticalSection�()	Unknown
 	msvcrt.dll!_freefls�()	Unknown
 	ntdll.dll!RtlProcessFlsData�()	Unknown
 	ntdll.dll!LdrShutdownProcess�()	Unknown
 	ntdll.dll!RtlExitUserProcess�()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
 	miniperl.exe!sig_terminate(int sig) Line 2792	C
 	miniperl.exe!win32_ctrlhandler(unsigned long dwCtrlType) Line 5207	C
 	kernel32.dll!CtrlRoutine�()	Unknown
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

update: how nice, both UCRT and msvcrt.dll are both aware that WinPerl constantly does _wsetlocale() calls to UCRT.

@bulk88
Copy link
Contributor Author

bulk88 commented Jun 6, 2025

update: how nice, both UCRT and msvcrt.dll are both aware that WinPerl constantly does _wsetlocale() calls to UCRT.

More destructors that get fired from DLLs that are not very useful to a TUI WinPerl process. imm32.dll is Win32's "Input Method Editor" library, On-Screen keyboards, ADA, etc. The reason its loaded is either or both user32.dll's fault, or winsock's fault.

>	imm32.dll!ImmDllInitialize�()	Unknown
 	ntdll.dll!LdrShutdownProcess�()	Unknown
 	ntdll.dll!RtlExitUserProcess�()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
 	miniperl.exe!sig_terminate(int sig) Line 2792	C
 	miniperl.exe!win32_ctrlhandler(unsigned long dwCtrlType) Line 5207	C
 	kernel32.dll!CtrlRoutine�()	Unknown
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

Another stack, I dont remember what this DLL does off the top of my head, I think its an out-of-process RPC/message passing API between the consumer process and on screen keyboard producer process.

>	ntdll.dll!RtlFreeHeap�()	Unknown
 	KernelBase.dll!LocalFree�()	Unknown
 	msctf.dll!ProcessDetach(struct HINSTANCE__ *)	Unknown
 	msctf.dll!DllMain�()	Unknown
 	msctf.dll!_CRT_INIT�()	Unknown
 	ntdll.dll!LdrShutdownProcess�()	Unknown
 	ntdll.dll!RtlExitUserProcess�()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
 	miniperl.exe!sig_terminate(int sig) Line 2792	C
 	miniperl.exe!win32_ctrlhandler(unsigned long dwCtrlType) Line 5207	C
 	kernel32.dll!CtrlRoutine�()	Unknown
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

@bulk88
Copy link
Contributor Author

bulk88 commented Jun 6, 2025

>	ntdll.dll!NtClose�()	Unknown
 	ntdll.dll!EtwpUnregisterProvider�()	Unknown
 	ntdll.dll!EtwNotificationUnregister�()	Unknown
 	ntdll.dll!EtwUnregisterTraceGuids�()	Unknown
 	msctf.dll!McGenEventUnregister�()	Unknown
 	msctf.dll!ProcessDetach(struct HINSTANCE__ *)	Unknown
 	msctf.dll!DllMain�()	Unknown
 	msctf.dll!_CRT_INIT�()	Unknown
 	ntdll.dll!LdrShutdownProcess�()	Unknown
 	ntdll.dll!RtlExitUserProcess�()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown

Jeez. Why bother manually nuking Ring 0 opaque handles on process exit event? Perl has PERL_DESTRUCT_LEVEL, but that concept isn't a MS API design pattern.

>	ucrtbase.dll!__crt_seh_guarded_call<void>::operator()<class <lambda_886d6c58226a84441f68b9f2b8217b83>,class <lambda_ab61a845afdef5b7c387490eaf3616ee> &,class <lambda_f7f22ab5edc0698d5f6905b0d3f44752> >(class <lambda_886d6c58226a84441f68b9f2b8217b83> &&,class <lambda_ab61a845afdef5b7c387490eaf3616ee> &,class <lambda_f7f22ab5edc0698d5f6905b0d3f44752> &&)	Unknown
 	ucrtbase.dll!common_flush_all()	Unknown
 	ucrtbase.dll!DllMainProcessDetach()	Unknown
 	ntdll.dll!LdrShutdownProcess�()	Unknown
 	ntdll.dll!RtlExitUserProcess�()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
 	miniperl.exe!sig_terminate(int sig) Line 2792	C
 	miniperl.exe!win32_ctrlhandler(unsigned long dwCtrlType) Line 5207	C
 	kernel32.dll!CtrlRoutine�()	Unknown
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown
>	comctl32.dll!_lock�()	Unknown
 	comctl32.dll!_fflush_nolock�()	Unknown
 	comctl32.dll!__endstdio�()	Unknown
 	comctl32.dll!__crtExitProcess�()	Unknown
 	comctl32.dll!_cinit�()	Unknown
 	comctl32.dll!__CRT_INIT�()	Unknown
 	comctl32.dll!_CRT_INIT�()	Unknown
 	ntdll.dll!LdrShutdownProcess�()	Unknown
 	ntdll.dll!RtlExitUserProcess�()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
 	miniperl.exe!sig_terminate(int sig) Line 2792	C
 	miniperl.exe!win32_ctrlhandler(unsigned long dwCtrlType) Line 5207	C
 	kernel32.dll!CtrlRoutine�()	Unknown
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

Wow Comctl32.dll has a private static linked copy of the MS CRT/Libc inside of it. I didn't know that until now. What is Comctl32.dll? just search RT/GH/ML archives for my dislike of it. Miniperl.exe loading it is a legit bug to fix on my todo list, since that binary is not capable of drawing a Win32 GUI widget whatsoever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants