Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix port clash by using different starting port for each test #545

Merged
merged 2 commits into from
Feb 26, 2022

Conversation

patrickkuo
Copy link
Contributor

@patrickkuo patrickkuo commented Feb 24, 2022

this should fix #494

@patrickkuo patrickkuo marked this pull request as ready for review February 24, 2022 14:26
@lxfind
Copy link
Contributor

lxfind commented Feb 24, 2022

Is it possible to avoid manually setting a starting port for each test? That makes it difficult to track and easy to make mistakes. I imagine at least we could use a shared static variable?

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting back to this!!!

The PortAllocator we have now makes a lot of sense to me in production: it makes sure we start looking for a port at a specific place, and if the server doesn't have an open port there, it makes things easy to debug : try the following ports in order until you find where your server started.

However, for tests, that's completely overkill: I don't care where I'm gonna end up, and I can just bind to 127.0.0.1:0 and let the underlying OS sort it out.

The PortAllocator takes a u16 argument to start the port search. Is there a way to refactor it to take an Option<u16> where the default semantics of None would be to do TcpListener::bind(("127.0.0.1:0")).is_ok()? That should provide an easy-to-use default for tests.

Besides that, you start_network is 🤩

@patrickkuo
Copy link
Contributor Author

Thanks for getting back to this!!!

The PortAllocator we have now makes a lot of sense to me in production: it makes sure we start looking for a port at a specific place, and if the server doesn't have an open port there, it makes things easy to debug : try the following ports in order until you find where your server started.

However, for tests, that's completely overkill: I don't care where I'm gonna end up, and I can just bind to 127.0.0.1:0 and let the underlying OS sort it out.

The PortAllocator takes a u16 argument to start the port search. Is there a way to refactor it to take an Option<u16> where the default semantics of None would be to do TcpListener::bind(("127.0.0.1:0")).is_ok()? That should provide an easy-to-use default for tests.

Besides that, you start_network is

Hmm this is harder then expected...

Currently the ports are pre allocated at setup phase (genesis) ahead of the server start up, because we need the port ahead of time to create the wallet.conf.
TcpListener::bind(("127.0.0.1:0")) will only bind momentarily because we are not starting up the real server.

I explored the possibility of starting the server first using '127.0.0.1:0' then create wallet.conf, this won't work without refactoring the existing network code, as there are no way to retrieve actual binding port from AuthorityServer

PORT_ALLOCATOR is a singleton object so it should generate unique available port if the tests are running in the same process, which works fine for me when running the test using cargo test, however it failed when I use cargo nextest run, I suspect it is running tests in parallel processes?

So for now I think using different starting port for each test is an easier fix for now until we refactor AuthorityServer.

@huitseeker
Copy link
Contributor

@patrickkuo So what I'm hearing is that the constructor of the PortAllocator is not the issue, the constructor of AuthorityServer is. Thankfully, I think we can simply change the returned AuthorityPrivateinfo so that the value that is provided there for the base_port is the integer 0 in tests (bypassing the port allocator). The constructor of the AuthorityServer will take that up, form an address finishing in the port descriptor 0, and should start on an empty port.

@patrickkuo
Copy link
Contributor Author

@patrickkuo So what I'm hearing is that the constructor of the PortAllocator is not the issue, the constructor of AuthorityServer is. Thankfully, I think we can simply change the returned AuthorityPrivateinfo so that the value that is provided there for the base_port is the integer 0 in tests (bypassing the port allocator). The constructor of the AuthorityServer will take that up, form an address finishing in the port descriptor 0, and should start on an empty port.

Yes the authority server should start using port 0 without problem.... but the wallet won't know which port to connect to...

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've investigated the start of the genesis network enough to be convinced: at the moment we deduce network, genesis and wallet config from each other, and then start the network.

From that moment on, it's hard to get any information back from the network, which makes my fix based on binding to zero pointless (but I note it also makes the info of the PortAllocator very speculative).

Let's land this to stop the bleeding, I'll try a few tricks to refactor on the downstream.

@patrickkuo patrickkuo force-pushed the pat/attempt_to_fix_port_clash branch from 2a77fac to 2b09e86 Compare February 26, 2022 00:16
@patrickkuo patrickkuo merged commit 0f22b0c into main Feb 26, 2022
@patrickkuo patrickkuo deleted the pat/attempt_to_fix_port_clash branch February 26, 2022 00:32
mwtian pushed a commit that referenced this pull request Sep 12, 2022
* test of consensus restore

* hack

* Fix consensus_restore test (#519)

* Revert "hack"

This reverts commit f06f9ec782141f8db614ab34fd3a0b95f089f2c1.

* fix: adapt the worker address, remove a few mut

* test node restore with cluster and switch metric type

* update fail message

Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>
mwtian pushed a commit to mwtian/sui that referenced this pull request Sep 29, 2022
* test of consensus restore

* hack

* Fix consensus_restore test (MystenLabs#519)

* Revert "hack"

This reverts commit f06f9ec782141f8db614ab34fd3a0b95f089f2c1.

* fix: adapt the worker address, remove a few mut

* test node restore with cluster and switch metric type

* update fail message

Co-authored-by: François Garillot <4142+huitseeker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test 'cli_tests::test_objects_command' fails sometimes
3 participants