Skip to content

Conversation

@bneradt
Copy link
Contributor

@bneradt bneradt commented Feb 10, 2022

…. (#7843)"

This reverts commit 2fbec37.

I'm reverting this to test whether this commit introduced CI regression
test instability.

apache#7843)"

This reverts commit 2fbec37.

I'm reverting this to test whether this commit introduced CI regression
test instability.
@bneradt bneradt added this to the 10.0.0 milestone Feb 10, 2022
@bneradt bneradt requested a review from bryancall as a code owner February 10, 2022 17:15
@bneradt bneradt self-assigned this Feb 10, 2022
@ywkaras ywkaras requested review from SolidWallOfCode and removed request for bryancall February 10, 2022 19:09
@SolidWallOfCode
Copy link
Member

That patch fixed a Yahoo! production bug (where an ATS is sitting directly in front of a Jetty application on the same box, and only ever resolves for the FQDN of that box). Flaky DNS would eventually disable all of the nameservers, requiring a restart. The interesting question is, why would changes in HostDB have this effect?

@ywkaras
Copy link
Contributor

ywkaras commented Feb 10, 2022

Can we make a fake PR that turns on certain debug tags in the regression test runs in CI? Would that perhaps provide better info?

@ywkaras
Copy link
Contributor

ywkaras commented Feb 10, 2022

In failed regression tests in #8662 like https://ci.trafficserver.apache.org/job/Github_Builds/job/centos/508/console we see:

REGRESSION TEST SDK_API_TSHttpConnectServerIntercept started
[Feb 10 15:53:20.173] [ET_NET 0] WARNING: connection to DNS server 10.255.250.30 lost, marking as down
[Feb 10 15:53:20.173] [ET_NET 0] WARNING: connection to DNS server 10.255.251.30 lost, marking as down
[Feb 10 15:53:20.173] [ET_NET 0] WARNING: connection to all DNS servers lost, retrying
TEST 158 *** STARTING ***
*** TEST 158 *** PASSED ***
...
Regression test(SDK_API_TSHttpConnectServerIntercept) still in progress
Regression test(SDK_API_TSHttpConnectServerIntercept) still in progress
[Feb 10 15:53:50.174] [ET_NET 2] WARNING: Synserver failed to bind to port 65535.
Fatal: traffic_server/InkAPITest.cc:910: failed assertion `!"Synserver must be able to bind to a port, check system netstat"`

In successful tests, we'll eventually see:

[Feb 10 15:52:11.589] [ET_NET 0] WARNING: connection to DNS server 10.255.250.30 restored

What's weird is that I don't think the ConnectServerIntercept test should really need any DNS lookup, it's testing the APIs for intercept plugins.

@bneradt
Copy link
Contributor Author

bneradt commented Feb 10, 2022

This is interesting debug output from your https://ci.trafficserver.apache.org/job/Github_Builds/job/centos/508/console run @ywkaras :

[Feb 10 15:53:50.174] [ET_NET 2] WARNING: Synserver failed to bind to port 65535.
Fatal: traffic_server/InkAPITest.cc:910: failed assertion !"Synserver must be able to bind to a port, check system netstat"

The port it tried to bind to, for some reason, is the 16 bit max value (SYNSERVER_DUMMY_PORT). I don't get the impression that it should actually try to listen on that port? Not sure. Still trying to figure out the test.

@bneradt bneradt merged commit 92d238a into apache:master Feb 11, 2022
@bneradt
Copy link
Contributor Author

bneradt commented Feb 11, 2022

What confuses me is that, given this message:

[Feb 10 15:53:50.174] [ET_NET 2] WARNING: Synserver failed to bind to port 65535.

That failure is associated with the #define SYNSERVER_DUMMY_PORT -1 value. However, that occurs in the synserver_vc_refuse which is only associated with a continuation in a single place:

ptest->os = synserver_create(SYNSERVER_LISTEN_PORT, TSContCreate(synserver_vc_refuse, TSMutexCreate()));

How did that function get called back with the 65535 (-1 for 16 bits) rather than the #define SYNSERVER_LISTEN_PORT 3300 value?

@bneradt
Copy link
Contributor Author

bneradt commented Feb 11, 2022

What confuses me is that, given this message:

[Feb 10 15:53:50.174] [ET_NET 2] WARNING: Synserver failed to bind to port 65535.

That failure is associated with the #define SYNSERVER_DUMMY_PORT -1 value. However, that occurs in the synserver_vc_refuse which is only associated with a continuation in a single place:

ptest->os = synserver_create(SYNSERVER_LISTEN_PORT, TSContCreate(synserver_vc_refuse, TSMutexCreate()));

How did that function get called back with the 65535 (-1 for 16 bits) rather than the #define SYNSERVER_LISTEN_PORT 3300 value?

Oh, I see. There are two messages with that exact same Synserver failed to bind to port warning, one in synserver_vc_refuse and another in synserver_vc_accept. I was looking at the wrong message location.

@ywkaras
Copy link
Contributor

ywkaras commented Feb 11, 2022

With this reverted, there are no longer any warnings during regression tests about the DNS server being down: https://ci.trafficserver.apache.org/job/Github_Builds/job/centos/523/console .

zwoop pushed a commit that referenced this pull request Feb 17, 2022
@zwoop
Copy link
Contributor

zwoop commented Feb 17, 2022

Cherry-picked to v9.2.x

@zwoop zwoop modified the milestones: 10.0.0, 9.2.0 Feb 17, 2022
bneradt pushed a commit to bneradt/trafficserver that referenced this pull request Feb 19, 2022
moonchen pushed a commit to moonchen/trafficserver that referenced this pull request Mar 17, 2022
* asf/9.2.x:
  Revert "DNS: Fix lack of nameserver failover in low use circumstances. (apache#7843)" (apache#8663)
  Fix strategies to initialize scheme (apache#8650)
  DNS: Fix lack of nameserver failover in low use circumstances. (apache#7843)
  Cleanup strategy debug logs (apache#8656)
  Making 9.2.x backwards compatible with 9.1.x (apache#8661)
  Adds two overridable config variables to control parent mark downs. (apache#8595)
  Fix plugin parent_select missing hostname len (apache#8649)
  Ports apache#7925 apache#8365 core to parent_select plugin (apache#8590)
  Ports apache#7897 from core strategies to parent_select plugin. (apache#8580)
  Adding clangd language server files to .gitignore (apache#8640)
  Make TsSharedMutex.h compile on MacOS. (apache#8645)
  In TsSharedMutex.h, make error reporting thread-safe. (apache#8636)
  Revert "body factory does not respect runroot (apache#8388)" (apache#8654)
  doc: Convert miscased Traffic Server references to |TS| macro (apache#8543)
  Add a new --enable-event-tracker configure option (apache#8179)
  Add parent_select plugin strategy caching (apache#8651)
  TLS Session Resumption: fix timed out session (apache#8667)
  Fix to allow running  from outside top_srcdir (apache#8673)
  Send diags output to stderr when running regression tests. (apache#8662)
  Default proxy.config.http.strict_uri_parsing to "2" (apache#8632)
@bneradt bneradt deleted the address_synserver_assertion_ci_failures branch March 29, 2023 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants