-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gather more diagnostics, and prepare skipping flaky DNS tests #36072
Conversation
Tagging subscribers to this area: @dotnet/ncl |
/azp run runtime-libraries outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
private static extern unsafe int getaddrinfo(string node, string service, addrinfo* hints, addrinfo** res); | ||
|
||
[DllImport("libc")] | ||
private static extern unsafe void freeaddrinfo(addrinfo* res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this P/Invoking is unlikely to be portable across various Unix implementations. It's why we use C shims as an intermediary for the production source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting to use gethostbyname()
as it takes string returns NULL or result so it should be easy to check and be portable IMHO. We chat about it with @antonfirsov and the goal is to check if system can resolve it self without depending on production code we are trying to test. We do not need to process the answer in any detailed way. We could still move it to Interop to make it more portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting to use
gethostbyname()
@wfurt the main reason for choosing freeaddrinfo
was because of gethostbyname
being marked obsolete. Also: freeaddrinfo
provides a trivial way to also get the error code, which I find useful.
All of this P/Invoking is unlikely to be portable across various Unix implementations.
@stephentoub while this is true in general, I don't see any reason to worry about it in this particular case. The addrinfo
structure seems binary compatible across unixes, and the lines responsible for the query in our own PAL code are also portable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addrinfo structure seems binary compatible across unixes
We've seen numerous cases where the actual layout differed in header files from what was shown in the man pages, including but not limited to when targeting 32-bit vs 64-bit.
and the lines responsible for the query in our own PAL code are also portable.
Those are compiled by the C compiler, which sees the actual header files. That's the primary reason the native shims exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that we should not p/invoke directly but only use native C shims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that @antonfirsov. But it gets you out of the structure business. Since it has pointers, the size will differ. socklen_t seems to be fixed to 32bit but I'm not sure if that is mandatory for al OSes.
Shim as @stephentoub suggested would be portable path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand the issue now.
From the various options I see, the simplest and safest one seems to be the use of gethostbyname
as @wfurt suggested. Looks like there are several existing cases where we P/Invoke into similar simple methods in test code. Is there any strong counter-argument against using gethostbyname
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean the:
struct hostent *gethostbyname(const char *name);
function, and we treat the return as an opaque pointer, e.g.
public static extern IntPtr gethostbyname(string name);
I'm ok with that.
The configuration
@wfurt do you think there's anything useful here to address the issue at infrastructure level? |
This is not too surprising and I cannot resolve 'dci-mac-build-167' from corp net either. cc: @MattGal |
This tracks with my understanding of the way these machines are meant to be configured, and I thought was why we use localhost / 127.0.01 for such testing. To add one thing, our MacOS machines are not meant to even be accessible from corpnet (they're on a special walled-off version of it) and instead are generally accessed via KVM. |
/azp run runtime-libraries outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
After having some discussions offline, it turned out that there is space for improvement in the behavior of our product code. ProblemThere's no guarantee that Currently we have a strict validation logic in the related PAL code: runtime/src/libraries/Native/Unix/System.Native/pal_networking.c Lines 285 to 289 in 3fda6ef
This may lead to failing
SuggestionWe already have a logic handling runtime/src/libraries/Native/Unix/System.Native/pal_networking.c Lines 318 to 323 in 3fda6ef
If @stephentoub @davidsh @ManickaP thoughts? |
Note GitHub actions team managed to find and solve the macOS issue in their environment: |
@antonfirsov, are you still working on this? |
Test failures in #1488 and #27622 are suspected to originate from infrastructure/OS/configuration issues, we assume that the underlying OS calls should also fail.
This PR adds a utility to detect these failures and depending on a parameter:
The plan is to first go with option 2. everywhere, then raise individual issues on possible new test failures with the current infrastructure. After collecting sufficient information, we can switch individual tests to go with option 1.
Fixes #1488, closes #27622 (already fixed)
Related past discussions: #28324, #16549