Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try to use a fixed port in test_server_config_default #8272

Merged
merged 2 commits into from
Feb 1, 2025

Conversation

msullivan
Copy link
Member

We had a flake regarding this.

(At least on my machine?), linux never randomly assigns an even
numbered port, so probably the flake collided with something external
specifically looking for it?

The find_available_port mechanism is fundamentally racy, of course;
the port could be allocated by someone else immediately, and in
testing I did manage to observe cycle lengths as short as 2 before a
reuse.

Possibly these tests should just have a big hammer
@retry_failed_test(3) decorator around them or something, but I am
reluctant to introduce this because I fear we will use it instead of
writing proper tests.

(Most tests don't have as good an excuse for flakiness as needing to
allocate a shared resource in a 16-bit namespace).

We had a flake regarding this.

(At least on my machine?), linux never randomly assigns an even
numbered port, so probably the flake collided with something external
specifically looking for it?

The find_available_port mechanism is fundamentally racy, of course;
the port could be allocated by someone else immediately, and in
testing I did manage to observe cycle lengths as short as 2 before a
reuse.

Possibly these tests should just have a big hammer
`@retry_failed_test(3)` decorator around them or something, but I am
reluctant to introduce this because I fear we will use it instead of
writing proper tests.

(Most tests don't have as good an excuse for flakiness as needing to
allocate a shared resource in a 16-bit namespace).
@1st1
Copy link
Member

1st1 commented Jan 29, 2025

@fantix why are we jumping through hoops here with 1024 and a while loop? Why not always just bind to 0?

CleanShot 2025-01-29 at 15 05 23@2x

@1st1
Copy link
Member

1st1 commented Jan 29, 2025

I see the new logic introduced in dea17d4

CleanShot 2025-01-29 at 15 06 55@2x

@msullivan
Copy link
Member Author

Oh, right, it was because there used to be a bug where ports greater than 32767 broke: #6194

I think it was there to work around a bug fixed in #6194
@msullivan
Copy link
Member Author

Also, you can link to code snippets and diff snippets in github; no need to resort to screenshots.

@1st1
Copy link
Member

1st1 commented Jan 29, 2025

Oh, right, it was because there used to be a bug where ports greater than 32767 broke: #6194

The way how I'd do this is to try binding at least one port through bind(0), and if that returns high enough ports then resort to do a while loop.

In any case, I don't quite understand how your PR fixes the problem. Are we running out of ports < 50000?

@msullivan
Copy link
Member Author

For the flake I saw, I kind of suspect that it collided with something else on the system trying to use port 50000, though I'm not really sure.

@msullivan
Copy link
Member Author

I'm not actually sure how much this will actually mitigate the flake, but it's a general cleanup at least.

@msullivan
Copy link
Member Author

Could someone approve this? It's a strict cleanup, even if it is only marginally likely to help the flakiness here

@msullivan msullivan merged commit a8c5ae0 into master Feb 1, 2025
23 checks passed
@msullivan msullivan deleted the no-fixed-port branch February 1, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants