-
Notifications
You must be signed in to change notification settings - Fork 21
Add Win32::HttpGetFile #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As far as I can see, the appveyor CI failure is due to the fact that Win32::GetLastError() always returns zero on cygwin, which sounds like a bug to me. That function has no tests of its own. I'm just using it to check for certain expected error conditions. I don't have anything cygwin available currently, so I'm not sure how to pursue fixing this (unrelated) problem. |
Doesn't look like that to me: I have no time right now to look further, but will take a peek in the next few days if you don't have a chance to look at it first. I see CI is using 5.32.1 instead of 5.30.0 on my VM, but I don't really see how they could break this function; it is a minimal wrapper around the Win32 api. What could break it would be if there is any other code being executed between the failure and the reporting; that would reset the error because there was a successful system call. |
This seems indeed to be the case when running under $ PERL_DL_NONLAZY=1 /usr/bin/perl.exe -Mblib t/HttpGetFile.t
1..4
# Running under perl version 5.032001 for cygwin
# Current time local: Sat Dec 18 21:29:34 2021
# Current time GMT: Sun Dec 19 05:29:34 2021
# Using Test.pm version 1.31
Error 12006 in Win32::HttpGetFile: The URL does not use a recognized protocol
ok 1
ok 2
Error 12005 in Win32::HttpGetFile: The URL is invalid
ok 3
ok 4
$ PERL_DL_NONLAZY=1 "/usr/bin/perl.exe" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/HttpGetFile.t
t/HttpGetFile.t .. 1/4 Error 12006 in Win32::HttpGetFile: The URL does not use a recognized protocol
# Test 2 got: "0" (t/HttpGetFile.t at line 16)
# Expected: "12006" (correct error code for unrecognized protocol)
# t/HttpGetFile.t line 16 is: ok(Win32::GetLastError(), '12006', "correct error code for unrecognized protocol");
Error 12005 in Win32::HttpGetFile: The URL is invalid
# Test 4 got: "0" (t/HttpGetFile.t at line 18)
# Expected: "12005" (correct error code for invalid URL)
# t/HttpGetFile.t line 18 is: ok(Win32::GetLastError(), '12005', "correct error code for invalid URL");
t/HttpGetFile.t .. Failed 2/4 subtests
Test Summary Report
-------------------
t/HttpGetFile.t (Wstat: 0 Tests: 4 Failed: 2)
Failed tests: 2, 4
Files=1, Tests=4, 0 wallclock secs ( 0.03 usr 0.08 sys + 0.03 cusr 0.09 csys = 0.23 CPU)
Result: FAIL
Failed 1/1 test programs. 2/4 subtests failed.I suggest to fix it like this: --- t/HttpGetFile.t
+++ t/HttpGetFile.t
@@ -12,10 +12,19 @@ END { 1 while unlink $tmpfile; }
# set PERL_WIN32_INTERNET_OK=1
plan tests => $ENV{PERL_WIN32_INTERNET_OK} ? 6 : 4;
-ok(Win32::HttpGetFile('nonesuch://example.com', 'NUL:'), "", "'nonesuch://' is not a real protocol");
-ok(Win32::GetLastError(), '12006', "correct error code for unrecognized protocol");
-ok(Win32::HttpGetFile('http://!#@!&@$', 'NUL:'), "", "invalid URL");
-ok(Win32::GetLastError(), '12005', "correct error code for invalid URL");
+# On Cygwin the test_harness will invoke additional Win32 APIs that
+# will reset the Win32::GetLastError() value, so capture it immediately.
+my $LastError;
+sub HttpGetFile {
+ my $ok = Win32::HttpGetFile(@_);
+ $LastError = Win32::GetLastError();
+ return $ok;
+}
+
+ok(HttpGetFile('nonesuch://example.com', 'NUL:'), "", "'nonesuch://' is not a real protocol");
+ok($LastError, '12006', "correct error code for unrecognized protocol");
+ok(HttpGetFile('http://!#@!&@$', 'NUL:'), "", "invalid URL");
+ok($LastError, '12005', "correct error code for invalid URL");With that change it passes even under $ PERL_DL_NONLAZY=1 "/usr/bin/perl.exe" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/HttpGetFile.t
t/HttpGetFile.t .. 1/4 Error 12006 in Win32::HttpGetFile: The URL does not use a recognized protocol
Error 12005 in Win32::HttpGetFile: The URL is invalid
t/HttpGetFile.t .. ok
All tests successful.
Files=1, Tests=4, 1 wallclock secs ( 0.03 usr 0.11 sys + 0.01 cusr 0.12 csys = 0.28 CPU)
Result: PASS |
|
#31 (already merged) fixes the |
|
Thanks for the patch. That did the trick for GetLastError(). Now one of the Strawberry Perl testers is failing because it can't find the winhttp library, which I thought had been available since Windows XP. The failed tester reports osvers=6.3, which I believe is Windows 8.1. I don't have any older Windows systems to test with, but I will look at the docs some more and see if there is some extra configuration or #ifdefing that can at least rescue this for current Windows. |
I think this is just a compiler error. Since you are using the Windows header files bundled with StrawberryPerl, which just seem to be missing some constants in their older versions. This has happened before because GCC cannot just copy the files from the Windows platform SDK; the files need to be updated manually. I think just adding the missing constants to --- Win32.xs
+++ Win32.xs
@@ -99,6 +99,16 @@ typedef LONG (WINAPI *PFNRegGetValueA)(HKEY, LPCSTR, LPCSTR, DWORD, LPDWORD, PVO
# define CSIDL_FLAG_CREATE 0x8000
#endif
+#ifndef WINHTTP_AUTO_DETECT_TYPE_DHCP
+# define WINHTTP_AUTO_DETECT_TYPE_DHCP 0x00000001
+#endif
+#ifndef WINHTTP_AUTO_DETECT_TYPE_DNS_A
+# define WINHTTP_AUTO_DETECT_TYPE_DNS_A 0x00000002
+#endif
+#ifndef WINHTTP_AUTOPROXY_AUTO_DETECT
+# define WINHTTP_AUTOPROXY_AUTO_DETECT 0x00000001
+#endif
+
/* Use explicit struct definition because wSuiteMask and
* wProductType are not defined in the VC++ 6.0 headers.
* WORD type has been replaced by unsigned short because |
|
Thanks. It looks like Strawberry Perl 5.20.x with gcc 4.8.3 is ok, but 5.18.x with gcc 4.7.3 is not. I will try your patch and see if that makes appveyor happy. If not, I may just have to make the function unavailable when building with GNUC less that 4.8.x. |
I guess it didn't. I wonder if the winhttp import library is missing completely from that compiler version. An alternative could be to use wininet instead of winhttp (like Win32::Internet), but maybe that isn't worth the effort? I assume if you check for compiler versions, you will just make the function |
|
The missing library was noted at Makefile.PL run time, so I'm not surprised we got linker errors even after trying to fake out the missing macros. I guess I will take this a bit further and try to check for gcc version and not include the new function for the very old and broken gcc versions that apparently have winhttp.h but do not have the library that goes with it, or all of the macros that were in it since Windows XP / Windows Server 2003. Since this is a new feature, anyone wanting to use it will have to do an existence check anyway, so to me it seems preferable to have them do a simple existence check for the function rather then check for existence, then check for croak, and only then use the function. But I could be missing something, so please set me straight if I have. Win32::Internet looks good and a reasonable alternative might have been to just include that in core, but the discussion on p5p to make that happen might have killed it. So just adding one 'simple' function to an existing core module seemed preferable. Little did I know :-). |
No, I think you are right; not defining the function when you cannot provide an implementation seems like a better choice. |
I was not suggesting that (and it is kind of unmaintained). I was suggesting that you could have based your single function on wininet instead of winhttp. But it may not be worth the effort now that you already have a working version using winhttp. |
|
It's true wininet has more features, but it was designed for interactive applications and it shows. By default it handles some errors by showing message boxes (!) and it shares configuration with Internet Explorer. IMO winhttp is a better fit for something used from Perl. |
Discussions around mitigating the recent CVEs against PAUSE/CPAN (CVE-2020-16154, CVE-2020-16155, and CVE-2020-16156) included the desire to make Perl core able to do secure downloads out of the box, and further discussion that mechanisms based on OpenSSL, curl, or wget may not be available on Windows. See, for example: https://www.nntp.perl.org/group/perl.perl5.porters/2021/12/msg262180.html This commit adds a native download function to the Win32 module based on the WinHttp library, a function that could in turn be used by CPAN.pm, etc. N.B. The autoproxy detection code has not been tested since I don't have a set-up where I can do that.
So preserve it immediately after call.
|
Finally! All the CI checks are passing. I applied Tomasz's patch to simplify the read loop and only allocate a buffer once, and I got the gcc version detection working so when building with older gcc it will skip building (and testing) the function since the winhttp library isn't available. I think the main outstanding item is whether there is a better way to provide an error return message than the warning I have now. I don't have a strong opinion about that. And I guess I have a bit more code clean-up to move declarations to a narrower scope. Thanks for all the feedback. I feel like this is nearly there. |
|
I assume the PR is now done, except for the warning/error reporting. Personally I think using # Just die if the file is not found (or anything else goes wrong):
Win32::HttpGetFile("https://example.com", "c:/example");
# Or handle failure:
unless (eval { Win32::HttpGetFile("https://example.com", "c:/example") }) {
# error handling based on Win32::GetLastError(), or `die $@` for unexpected error codes
}Anyways, let me know when you think the PR is ready for final review and merging! |
|
@xenu's sample code reminds me of one request I still have: If the function fails, it should delete the file, if it has already created it. The file should only exist when the function was successful. |
|
Thanks to @xenu for all the feedback. It gets a bit confusing with the comments in various places in the discussion above, so I'm going to summarize the changes I would like to see:
|
OK. Obviously that would no longer protect against man-in-the-middle attacks, which was one of the CVEs against PAUSE, but I guess we can trust people to only enable this if they know what they are doing.
This is a reasonable interface. I'll have to do some research and/or find examples on how to do the XS part.
What happens now is that the file downloads successfully and saves the contents of the target server's 404 page in the file specified. So if we want HTTP status codes outside of the 2xx range to be considered errors, we'll need to call WinHttpQueryHeaders (possibly twice) to get the HTTP status code and text and have a parallel error return section that uses this information instead of using GetLastError() and FormatMessage().
I don't know how the winhttp APIs handle this. It may give you a 301 or 302 and you have to try again at the URL it gives you. I can see testing this manually but I don't see how it can be added to the automated tests because it seems unlikely to find a public server that can always be guaranteed to redirect. I'm about to disappear into holiday and other activities so it will be a while before I can pursue this further. |
Yes, it is useful if you use https just to encrypt the connection, but don't care about validating the endpoint, e.g. when you are using self-signed certs for internal servers, but don't want to add your own CA cert to the systems trust store.
If you are talking about the scalar vs. list context returns, there are plenty samples in
From the user's point of view it doesn't matter if the file hasn't been downloaded because the server is offline, or because the URL has changed: if the local file doesn't contain the expected content, then the call was a failure.
All Github downloads go through redirects, so just download something from here: $ curl -I https://github.com/perl-libwin32/win32/archive/refs/tags/v0.57.zip
HTTP/2 302
server: GitHub.com
date: Wed, 22 Dec 2021 17:27:31 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Accept-Encoding, Accept, X-Requested-With
permissions-policy: interest-cohort=()
location: https://codeload.github.com/perl-libwin32/win32/zip/refs/tags/v0.57
[...]
Yeah, no worries; I don't think this is super-urgent... Enjoy your time off! |
It has a partial winhttp.h header, but not the associated library, so we won't try to build anything based on winhttp there.
Don't croak or warn, but make the status message available along with the return status when called in list context. In scalar context, we'll still just get thumbs up or thumbs down, but can get some idea of what went wrong on failure from Win32::GetLastError(), although we won't know whether that's a system error or WinHttp error. Win32::HttpGetFile optionally ignores certificate errors Make Win32::HttpGetFile fail on HTTP failure There's not much point in successfully downloading an error page from the server and saving it to a file, and in fact it's a bit unfriendly because the user would be unlikely to know there is a problem until inspecting file contents some time later.
Setting the POSIX locale does not affect what FormatMessage does, so in order to test specific error messages, we need to make sure we are in a Windows locale with known message values.
We'll use a couple of badssl.com sites currently supported by Google. The docs say they are for manual testing only, but that's essentially what we are doing since we don't normally run internet- connecting tests without manually enabling them. So we'll only hit those sites when a developer is doing something extra to test these features, not when anyone is just installing the Win32 package. The docs also say these sites are subject to change without notice, so if that happens, we may have to look elsewhere.
|
I've pushed new tests that validate the sha1 of the GitHub archive download, check that bad certificates cause the call to fail without the optional third parameter and succeed with it, and follow the suggestion for figuring out whether the current locale gives us English error messages. |
|
Thanks Craig, this is looking pretty good now. Unfortunately I found one more issue: When the function fails because of an http status error, there is no call to use Win32;
my($ok, $err) = Win32::HttpGetFile("https://httpstat.us/500", "nul:");
die "This was supposed to fail" if $ok;
printf "LastError=%d Message='%s'\n", Win32::GetLastError(), $err;This doesn't match the documentation, which says that the error will be set when the function fails: Z:\jan\git\libwin32\win32>perl -Mblib 500.pl
LastError=0 Message='Internal Server Error'I've been looking through Windows error codes, to see if I can find any generic error that would cover this, but have not been successful. I wonder if we should set it to |
|
On Dec 30, 2021, at 3:53 PM, Jan Dubois ***@***.***> wrote:
@jandubois commented on this pull request.
In t/HttpGetFile.t:
+ "successfully downloaded a zipball via redirect");
+
+ $sha = undef;
Nitpicking, but what is the purpose of this statement? The variable is assigned to again in the next line.
Just robust coding practice. In most languages the garbage collector cannot free the memory attached to an object unless you set it to null or undef or None or whatever language-specific variant. Of course it doesn't matter here the way it would if the object contained a gigantic XML document in memory inside a loop over thousands of documents. But, it's a habit I've gotten into. Feel free to remove it if it bothers you.
|
Objects are reference counted, so what is required is to drop the reference, which the next assignment does anyways. Particular to Perl is that the PV slot in an SV will get re-allocated when the string grows, but it never shrinks. And since local variables inside functions are stored in a pad, that allocation persists even when you return from the function. And functions keep a stack of pads (for each recursion level), so this could grow quite a bit if you use large strings, but only once in a while. The But that is not the case here, so it looked to me like something left-over from development, that has been forgotten to be deleted.
It bothers me a little bit (since it is cargo-culted code with no obvious function), but not enough that I would add an extra commit to change it... 😸 |
Hi Craig, Happy New Year! Just checking in if you were planning to still make a change for this (to make sure LastError isn't 0 when the function failed), or if you are just waiting for me to merge the PR? It would be as simple as adding something like this at the end of the function: Cheers, |
Happy new year to you as well. I looked long and hard for the correct Windows system error codes that correspond to HTTP status, but they don't seem to exist. Obviously you can't just use the HTTP status. For example, decimal 400 is some kind of threading error. I saw some reference to user-defined error messages, but that's not the problem because we already have a message. I don't know how to pick a range that is guaranteed not to collide with anything, but if you are comfortable with just adding a million to the HTTP status, I'll go ahead and do that. |
Hey, that was just a hundred thousand, not a million. 😄 I didn't find anything either the last time I looked, but now I spotted this:
So I would say we should return The docs should explain this, in case anyone ever wants to get the HTTP status: my $err = $Win32::GetLastError();
if ($err & (1<<29)) {
printf "HTTP STATUS was %d\n", $err & ~(1<<29);
}I still kind of liked adding $ perl -E 'printf "%032b %10d\n", $_, $_ for 1<<29, 1e9+404'
00100000000000000000000000000000 536870912
00111011100110101100101110010100 1000000404Then you could write: my $err = $Win32::GetLastError();
if ($err > 1e9) {
printf "HTTP STATUS was %d\n", ($err - 1e9);
} |
|
I think I prefer adding use Win32;
my($ok, $err) = Win32::HttpGetFile("https://httpstat.us/500", "nul:");
die "This was supposed to fail" if $ok;
printf "LastError=%d Message='%s'\n", Win32::GetLastError(), $err;will print Z:\jan\git\libwin32\win32>perl -Mblib 500.pl
LastError=1000000500 Message='Internal Server Error'instead of the inscrutable error |
There are no corresponding system error codes, but setting bit 29 denotes a user-defined error code. Conveniently, adding 1e9 to the HTTP status sets bit 29, and calling SetLastError() with the result gives us a value that is easy to coerce into the actual HTTP status.
This is now done. The CI tests hung up because it was unable to install Strawberry Perl 5.26.1.1. If there is a way to kick them off again in case it was a transient error, that might be the best next step. Otherwise I don't know what to suggest. It does pass all tests for me with Visual Studio 2019 and it passed the first integration test with Strawberry Perl 5.28. |
I did rerun CI and it failed again with the same error: @xenu Any idea what is happening? The log does not include much information; it looks like the download succeeds, but the install fails. The installation for 5.28.0.1 succeeds, so maybe somebody removed older versions of Perl from the repo? |
It works locally, I have no idea why it doesn't work here. |
I assume by "works locally" you have successfully installed Strawberry Perl 5.26.1.1 via chocolatey exactly as the GitHub CI would? The only things I've found on this are that it usually indicates some system or resource issue, e.g., disk full, corrupted chocolatey cache, etc. I don't know enough about how GitHub CI works to suggest how the environment could be reinitialized to clear such issues. |
jandubois
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks great! @xenu, if you have time, could you do one final review? Otherwise I'm ready to merge it now.
It isn't using Github CI; it is using AppVeyor, an external service. Using my usual remedy for cloud service errors (waiting another day and trying again) has worked out, so all tests pass. I guess we should update the test matrix to include some newer Perl versions, but that is a separate issue. |
jandubois
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I found one more error condition that should be handled by deleting the partial output file.
But only if it was successfully opened.
jandubois
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your latest change reminded me that hOut should be initialized to INVALID_HANDLE_VALUE and not NULL.
jandubois
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience with the endless review requests! It looks great now!
Will try to do a release tonight...
file: $CPAN/authors/id/J/JD/JDB/Win32-0.58.tar.gz |
Discussions around mitigating the recent CVEs against PAUSE/CPAN
(CVE-2020-16154, CVE-2020-16155, and CVE-2020-16156) included the desire
to make Perl core able to do secure downloads out of the box, and
further discussion that mechanisms based on OpenSSL, curl, or wget may
not be available on Windows. See, for example:
https://www.nntp.perl.org/group/perl.perl5.porters/2021/12/msg262180.html
This commit adds a native download function to the Win32 module based on
the WinHttp library, a function that could in turn be used by CPAN.pm,
etc.
N.B. The autoproxy detection code has not been tested since I don't
have a set-up where I can do that.