Skip to content

[Tests] Diagnose Windows cert creation failures #3237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 8, 2025

Conversation

paulmedynski
Copy link
Contributor

@paulmedynski paulmedynski commented Mar 24, 2025

Diagnostic changes for #3223. We're now emitting the PowerShell script content used to generate a self-signed cert and add it to the Windows system cert store. This may help figure out why the cert operations fail intermittently.

Created this draft PR to run the CI pipelines where the failures sometimes occur and acquire the PowerShell content for further investigation.

@edwardneal
Copy link
Contributor

I think I introduced this in #3034, which used PowerShell to generate certificates more frequently. Apologies if so!

The only difference in the PS invocation which I can see between my version and the preceding one is that we no longer load the user profile. Does setting ProcessStartInfo.LoadUserProfile to true fix this?

@paulmedynski
Copy link
Contributor Author

I think I introduced this in #3034, which used PowerShell to generate certificates more frequently. Apologies if so!

The only difference in the PS invocation which I can see between my version and the preceding one is that we no longer load the user profile. Does setting ProcessStartInfo.LoadUserProfile to true fix this?

LoadUserProfile is already being set to true, so likely not the issue. I've run some manual tests in a Windows Server container, and the PS script works fine. Since the failures are intermittent, I suspect a flaky test environment.

We're having some ADO CI pipeline issues that are preventing me from running CI reliably right now, so I will circle back once the overall CI stuff settles down.

Copy link

codecov bot commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.58%. Comparing base (1247ca4) to head (4dabe63).
Report is 15 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3237       +/-   ##
===========================================
+ Coverage   72.69%   92.58%   +19.88%     
===========================================
  Files         303        6      -297     
  Lines       59718      310    -59408     
===========================================
- Hits        43414      287    -43127     
+ Misses      16304       23    -16281     
Flag Coverage Δ
addons 92.58% <ø> (ø)
netcore ?
netfx ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paulmedynski paulmedynski force-pushed the dev/paul/issue-3223-pwsh-test-failure branch from 11d4009 to b1bb29f Compare March 25, 2025 10:57
- Added powershell script content to test error output for disgnostics.
- Replaced #if NET9_0 with #if NET9_0_OR_GREATER to future-proof our directives.
- Removed unnecessary NETFRAMEWORK symbol.
- Updated download URL for .NET x86 installer for tests pipelines.
@paulmedynski paulmedynski force-pushed the dev/paul/issue-3223-pwsh-test-failure branch from b65989c to 5a29f5b Compare March 31, 2025 16:34
- Added some retries to the PowerShell script execution to see if that helps avoid overall test failures.
@paulmedynski
Copy link
Contributor Author

The latest commit adds a retry within the CertificateFixtureBase.CreateCertificate that seems to work around the ACCESS DENIED issue. We now see this in the test logs:

2025-03-31T17:22:11.0473456Z   PowerShell command failed with exit code 1 on attempt 1 of 3; retrying in 5 seconds...
...
2025-03-31T17:22:19.7322237Z     Passed Microsoft.Data.SqlClient.Tests.AlwaysEncryptedTests.ExceptionsAlgorithmErrors.TestInvalidCipherText [44 ms]

There are no warnings or errors for the CI runs.

@paulmedynski paulmedynski marked this pull request as ready for review April 1, 2025 11:05
@paulmedynski paulmedynski requested review from Copilot and a team April 1, 2025 11:06
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates various preprocessor directives from NET9_0 to NET9_0_OR_GREATER and introduces enhanced diagnostics for Windows certificate creation failures via retry logic and additional output details. Key changes include updating build conditions in multiple certificate and SQL components, adding retry logic with delay and logging for PowerShell command execution in certificate creation, and updating a pipeline URL to a newer dotnet-install.ps1 script location.

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ColumnEncryptionCertificateFixture.cs Updated preprocessor condition to reflect newer target frameworks
CertificateFixtureBase.cs Replaced NET9_0 with NET9_0_OR_GREATER and added retry logic with delay and diagnostic output for certificate generation
CertificateTestWithTdsServer.cs Updated preprocessor condition for certificate loading
CertificateUtility.cs Updated preprocessor condition for certificate loading and RSA decryption/signature verification
SqlDbTypeExtensions.cs Updated preprocessor condition for exposing the Json SqlDbType constant
VirtualSecureModeEnclaveProvider.cs Updated preprocessor condition for loading a certificate from binary payload
SNICommon.cs Updated preprocessor condition for validating SSL server certificates
ci-run-tests-job.yml Updated the URL to the latest dotnet-install.ps1 script
Files not reviewed (1)
  • src/Microsoft.Data.SqlClient/netfx/src/Microsoft.Data.SqlClient.csproj: Language not supported
Comments suppressed due to low confidence (1)

src/Microsoft.Data.SqlClient/tests/tools/Microsoft.Data.SqlClient.TestUtilities/Fixtures/CertificateFixtureBase.cs:173

  • Consider adding tests to verify the retry logic and ensure correct behavior on repeated PowerShell failures.
for (int attempt = 1; attempt <= retries; ++attempt)

Copy link
Contributor Author

@paulmedynski paulmedynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review, with my comments added for context.

@@ -236,7 +236,7 @@ jobs:
- powershell: |
dotnet --info

Invoke-WebRequest https://dot.net/v1/dotnet-install.ps1 -OutFile dotnet-install.ps1
Invoke-WebRequest https://builds.dotnet.microsoft.com/dotnet/scripts/v1/dotnet-install.ps1 -OutFile dotnet-install.ps1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids Invoke-WebRequest failures due to some dead IPs in the DNS results for dot.net, and is the proper canonical DNS name for the dotnet scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit fishy since we have the UseDotNet task we could run before this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'm getting conflicting information:

https://devblogs.microsoft.com/dotnet/critical-dotnet-install-links-are-changing/

Says that the canonical CDN is builds.dotnet.microsoft.com.

https://learn.microsoft.com/en-us/dotnet/core/install/windows#install-with-powershell

Says the URL is https://dot.net/v1/dotnet-install.ps1

Those DNS names resolve to different IPs:

;; ANSWER SECTION:
dot.net.                143     IN      A       20.112.250.133
dot.net.                143     IN      A       20.236.44.162
dot.net.                143     IN      A       20.70.246.20
dot.net.                143     IN      A       20.76.201.171
dot.net.                143     IN      A       20.231.239.246
;; ANSWER SECTION:
builds.dotnet.microsoft.com. 2922 IN    CNAME   dotnetcli.trafficmanager.net.
dotnetcli.trafficmanager.net. 60 IN     CNAME   builds.dotnet.microsoft.com.edgesuite.net.
builds.dotnet.microsoft.com.edgesuite.net. 9469 IN CNAME a441.dscd.akamai.net.
a441.dscd.akamai.net.   20      IN      A       24.222.184.168
a441.dscd.akamai.net.   20      IN      A       24.222.184.138

I can revert it back, but dead IPs in the dot.net records were definitely causing job failures earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use builds.dotnet.microsoft.com

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, the dot.net hosts just HTTP redirect to the other URL:

$ curl -I https://dot.net/v1/dotnet-install.ps1
HTTP/2 301
date: Fri, 04 Apr 2025 19:15:47 GMT
server: Kestrel
location: https://builds.dotnet.microsoft.com/dotnet/scripts/v1/dotnet-install.ps1
strict-transport-security: max-age=31536000

I guess that one of them is/was dead. I think I'll keep it as-is.

@paulmedynski paulmedynski added this to the 6.1-preview1 milestone Apr 2, 2025
@mdaigle mdaigle self-assigned this Apr 2, 2025
Copy link
Contributor

@benrr101 benrr101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall pretty good - if it works, I won't complain too hard.

@@ -236,7 +236,7 @@ jobs:
- powershell: |
dotnet --info

Invoke-WebRequest https://dot.net/v1/dotnet-install.ps1 -OutFile dotnet-install.ps1
Invoke-WebRequest https://builds.dotnet.microsoft.com/dotnet/scripts/v1/dotnet-install.ps1 -OutFile dotnet-install.ps1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit fishy since we have the UseDotNet task we could run before this...

@paulmedynski paulmedynski requested review from benrr101 and mdaigle April 4, 2025 19:22
@cheenamalhotra cheenamalhotra added the Area\Tests Issues that are targeted to tests or test projects label Apr 7, 2025
@paulmedynski paulmedynski merged commit bf65632 into main Apr 8, 2025
251 checks passed
@paulmedynski paulmedynski deleted the dev/paul/issue-3223-pwsh-test-failure branch May 20, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area\Tests Issues that are targeted to tests or test projects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants