Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sui/shared] Make shared_object_sync less flaky #6467

Merged
merged 1 commit into from
Nov 29, 2022

Conversation

amnn
Copy link
Member

@amnn amnn commented Nov 29, 2022

Because not all validators pass transactions on to consensus, the previous implementation of this test can sometimes fail because the one validator that was fed the initial transaction does not pass it along.

The test was carefully set-up so that this doesn't happen, but any change that affects the Rng seed that decides whether this happens or not can cause it spuriously fail or hang.

The behaviour under test is that out-of-date authorities can catch up with the rest because the appropriate certificates get shared through the consensus layer.

The change is to test this by excluding one validator to begin with, rather than including only one validator, to guarantee that at least one will propagate the certificate.

Test Plan:

sui$ cargo simtest shared_object_sync

Note that running simtest multiple times (see below) didn't seem to trigger the issue, I think because the logic that decides whether or not a validator propagates a transaction is driven by ObjectIDs changing:

sui$ env MSIM_TEST_NUM=20 cargo simtest shared_object_sync

So instead, this change needs to be tested before and after a change that required tweaking the validators to make this test work again, e.g. 6102b24 -- it should work on either side without any material changes (apart from merge conflict resolution).

Because not all validators pass transactions on to consensus, the
previous implementation of this test can sometimes fail because the
one validator that was fed the initial transaction does not pass it
along.

The test was carefully set-up so that this doesn't happen, but any
change that affects the Rng seed that decides whether this happens or
not can cause it spuriously fail or hang.

The behaviour under test is that out-of-date authorities can catch up
with the rest because the appropriate certificates get shared through
the consensus layer.

The change is to test this by excluding one validator to begin with,
rather than including only one validator, to guarantee that at least
one will propagate the certificate.

Test Plan:

```
sui$ cargo simtest shared_object_sync
```

Note that running simtest multiple times (see below) didn't seem to
trigger the issue, I think because the logic that decides whether or
not a validator propagates a transaction is driven by ObjectIDs
changing:

```
sui$ env MSIM_TEST_NUM=20 cargo simtest shared_object_sync
```

So instead, this change needs to be tested before and after a change
that required tweaking the validators to make this test work again,
e.g. 6102b24 -- it should work on either side without any material
changes (apart from merge conflict resolution).
@andll
Copy link
Contributor

andll commented Nov 29, 2022

Thank you!

@amnn amnn merged commit e8f9e6b into MystenLabs:main Nov 29, 2022
@amnn amnn deleted the stable-shared-sync branch November 29, 2022 20:44
@mystenmark
Copy link
Contributor

FYI added #6469, which would have allowed repeated simtest runs to identify this bug. (previously the tx digest had no random component at all)

Jibz1 pushed a commit that referenced this pull request Dec 7, 2022
Because not all validators pass transactions on to consensus, the
previous implementation of this test can sometimes fail because the
one validator that was fed the initial transaction does not pass it
along.

The test was carefully set-up so that this doesn't happen, but any
change that affects the Rng seed that decides whether this happens or
not can cause it spuriously fail or hang.

The behaviour under test is that out-of-date authorities can catch up
with the rest because the appropriate certificates get shared through
the consensus layer.

The change is to test this by excluding one validator to begin with,
rather than including only one validator, to guarantee that at least
one will propagate the certificate.

Test Plan:

```
sui$ cargo simtest shared_object_sync
```

Note that running simtest multiple times (see below) didn't seem to
trigger the issue, I think because the logic that decides whether or
not a validator propagates a transaction is driven by ObjectIDs
changing:

```
sui$ env MSIM_TEST_NUM=20 cargo simtest shared_object_sync
```

So instead, this change needs to be tested before and after a change
that required tweaking the validators to make this test work again,
e.g. 6102b24 -- it should work on either side without any material
changes (apart from merge conflict resolution).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants