-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test homeserver restarts during a partial state join #378
Test homeserver restarts during a partial state join #378
Conversation
…serve requests Signed-off-by: Sean Quah <seanq@matrix.org>
…hey will change on restart Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
Signed-off-by: Sean Quah <seanq@matrix.org>
HostIP: "127.0.0.1", | ||
HostPort: strconv.Itoa(port1), | ||
}, | ||
}, | ||
nat.Port("8448/tcp"): []nat.PortBinding{ | ||
{ | ||
HostIP: "127.0.0.1", | ||
HostIP: "127.0.0.1", | ||
HostPort: strconv.Itoa(port2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must provide an explicit port, otherwise we'll get a different random port when the container is restarted. Which breaks everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it break everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseURL
and FedBaseURL
of HomeserverDeployment
become incorrect because they include the port number. We can fix up those URLs. But any CSAPI
clients will continue to use the previous port number after the restart and that's not so easy to fix up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a reasonably easy thing to fix? There's several possible solutions:
- Add documentation to
Restart()
which explicitly states that any clients created with this deployment need to be recreated. Or... - Add a callback hook to
client.CSAPI
to update the base URL and get test authors to call that callback with the updated URL. Or... - Remember all
client.CSAPI
instances created viafunc (d *Deployment) Client(t *testing.T, hsName, userID string) *client.CSAPI
and automatically call the callback hook when the deployment is restarted to re-point port numbers.
The last option is the most preferable because:
- All the complexity is in one place and does not need to be repeated in tests.
- Test authors don't need to know or care that the port number changes from under them when they call
deployment.Restart()
.
In addition, I would change the function signature of restart from:
func (dep *Deployment) Restart() error
to
func (dep *Deployment) Restart(t *testing.T)
which will fail the test if the restart fails (returns an error).
At present, this PR cannot be accepted because it will introduce flakiness because these port numbers will race: we have to let the underlying OS decide the ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the PR and made the changes in #396. Let me know if I've missed anything.
}() | ||
|
||
// wait for the state_ids request to arrive | ||
psjResult.AwaitStateIdsRequest(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AwaitStateIdsRequest
cannot be used twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to drop the second usage rather than try to rejig things.
…ns_resume_state_resyncing_on_restart
@matrix-org/dendrite-core The first three commits touch some core complement machinery and might be worth taking a look at? |
@erikjohnston I made one small change: c0caa5e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main problem with this lies with the usage of an explicit port. We cannot guarantee this as:
- Tests in the same file can run in parallel via
.Parallel()
- Tests in different directories can run in parallel (modifiable via
go test -p N
)
We need to preserve that ability to ensure we can grow the test case library without suffering increasingly long test times, so we cannot add features which would compromise that. I've asked for a bit more information than just "which breaks everything".
HostIP: "127.0.0.1", | ||
HostPort: strconv.Itoa(port1), | ||
}, | ||
}, | ||
nat.Port("8448/tcp"): []nat.PortBinding{ | ||
{ | ||
HostIP: "127.0.0.1", | ||
HostIP: "127.0.0.1", | ||
HostPort: strconv.Itoa(port2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it break everything?
@@ -391,6 +348,39 @@ func deployImage( | |||
return d, nil | |||
} | |||
|
|||
// Picks two free ports on localhost. Does not reserve them in any way. | |||
// The returned ports must be used before the next call to `allocateHostPorts`, | |||
// otherwise the same pair of ports may be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot guarantee that as homeservers are deployed in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I didn't realise that we ran tests in parallel at the time.
internal/docker/deployer.go
Outdated
// The returned ports must be used before the next call to `allocateHostPorts`, | ||
// otherwise the same pair of ports may be returned. | ||
func allocateHostPorts() (int, int, error) { | ||
localhost_any_port := net.TCPAddr{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no snake_case please. Be consistent with existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Tests matrix-org/synapse#12642