Skip to content

Fix port range allocation with large worker IDs #44213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 12, 2019

Conversation

alpar-t
Copy link
Contributor

@alpar-t alpar-t commented Jul 11, 2019

Relates to #43983

The IDs gradle uses are incremented for the lifetime of the daemon which
can result in port ranges that are outside the valid range.
This change implements a modulo based formula to wrap the port ranges
when the IDs get too large.

Adresses #44134 but #44157 is also required to be able to close it.

Relates to elastic#43983

The IDs gradle uses are incremented for the lifetime of the daemon which
can result in port ranges that are outside the valid range.
This change implements a modulo based formula to wrap the port ranges
when the IDs get too large.

Adresses elastic#44134 but elastic#44157 is also required to be able to close it.
@alpar-t alpar-t added :Delivery/Build Build or test infrastructure :Distributed Coordination/Network Http and internode communication implementations v8.0.0 v7.4.0 v7.3.1 labels Jul 11, 2019
@alpar-t alpar-t requested a review from ywelsch July 11, 2019 09:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

// See also: https://github.com/elastic/elasticsearch/issues/44134
final int startAt = Math.round(
RandomizedTest.systemPropertyAsLong(ESTestCase.TEST_WORKER_VM_ID, 0)
% 327L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 223 to cater for adding 10300 below.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left two minor comments, looking good o.w.

// This is safe as long as we have fewer than 224 Gradle workers running in parallel
// See also: https://github.com/elastic/elasticsearch/issues/44134
final int startAt = Math.round(
RandomizedTest.systemPropertyAsLong(ESTestCase.TEST_WORKER_VM_ID, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use Math.floorMod(long, int) here instead that returns an int. This means that there's no need for Math.round.
Also we should check that the long returned by RandomizedTest.systemPropertyAsLong(ESTestCase.TEST_WORKER_VM_ID, 0) is non-negative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually needed because systemPropertyAsLong actually returns a float, seems like a bug in randomized testing, so Math.floorMod won't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're using the value as system property, not the actual system property. This means that this will always return 0 at the moment. Sounds like we need a test here ;)

I think it would be simpler to write something like the following:

final int startAt = ESTestCase.TEST_WORKER_VM_ID.equals(ESTestCase.DEFAULT_TEST_WORKER_ID) ?
            0 :
            Math.floorMod(Long.parseLong(ESTestCase.TEST_WORKER_VM_ID), 223);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️
I fixed it and added a test

return basePort + "-" + (basePort + 99); // upper bound is inclusive
}

public static MockNioTransport newMockTransport(Settings settings, Version version, ThreadPool threadPool) {
settings = Settings.builder().put(TransportSettings.PORT.getKey(), getPortRange()).put(settings).build();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo this

@alpar-t
Copy link
Contributor Author

alpar-t commented Jul 11, 2019

@elasticmachine run elasticsearch-ci/2 and run elasticsearch-ci/1

@alpar-t
Copy link
Contributor Author

alpar-t commented Jul 11, 2019

@elasticmachine run elasticsearch-ci/fwc

@alpar-t
Copy link
Contributor Author

alpar-t commented Jul 11, 2019

@elasticmachine run elasticsearch-ci/fwc

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alpar-t alpar-t merged commit 433b345 into elastic:master Jul 12, 2019
alpar-t added a commit that referenced this pull request Jul 12, 2019
* Fix port range allocation with large worker IDs

Relates to #43983

The IDs gradle uses are incremented for the lifetime of the daemon which
can result in port ranges that are outside the valid range.
This change implements a modulo based formula to wrap the port ranges
when the IDs get too large.

Adresses #44134 but #44157 is also required to be able to close it.
alpar-t added a commit that referenced this pull request Jul 12, 2019
* Fix port range allocation with large worker IDs

Relates to #43983

The IDs gradle uses are incremented for the lifetime of the daemon which
can result in port ranges that are outside the valid range.
This change implements a modulo based formula to wrap the port ranges
when the IDs get too large.

Adresses #44134 but #44157 is also required to be able to close it.
@jpountz jpountz added v7.3.0 and removed v7.3.1 labels Jul 15, 2019
alpar-t added a commit to alpar-t/elasticsearch that referenced this pull request Aug 15, 2019
Moves methods added in elastic#44213 and uses them to configure the port range
for `ExternalTestCluster` too.
These were still using `9300-9400` ( teh default ) and running into
races.
alpar-t added a commit that referenced this pull request Aug 15, 2019
Moves methods added in #44213 and uses them to configure the port range
for `ExternalTestCluster` too.
These were still using `9300-9400` ( teh default ) and running into
races.
alpar-t added a commit that referenced this pull request Aug 15, 2019
Moves methods added in #44213 and uses them to configure the port range
for `ExternalTestCluster` too.
These were still using `9300-9400` ( teh default ) and running into
races.
alpar-t added a commit that referenced this pull request Aug 15, 2019
Moves methods added in #44213 and uses them to configure the port range
for `ExternalTestCluster` too.
These were still using `9300-9400` ( teh default ) and running into
races.
@mark-vieira mark-vieira added the Team:Delivery Meta label for Delivery team label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure :Distributed Coordination/Network Http and internode communication implementations Team:Delivery Meta label for Delivery team v7.3.0 v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants