Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/robustness: Split model code into deterministic and non-deterministic #15819

Merged
merged 2 commits into from
May 5, 2023

Conversation

serathius
Copy link
Member

Big refactor to separate deterministic etcd model (assumes all request succeed) and non-deterministic (considers failed request both persisted and not). This way a non-deterministic model just layer above deterministic one.

cc @ahrtr @ptabor

@serathius serathius force-pushed the robustness-non-deterministic branch 2 times, most recently from 37b2d89 to 49f84cb Compare May 3, 2023 18:45
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
@serathius serathius force-pushed the robustness-non-deterministic branch from 49f84cb to 9965bc1 Compare May 4, 2023 12:30
@serathius
Copy link
Member Author

PTAL @ahrtr
This refactor makes code much clearer, however it's nightmare to rebase against.

tests/robustness/model/base.go Outdated Show resolved Hide resolved
tests/robustness/model/model.go Outdated Show resolved Hide resolved
tests/robustness/model/base.go Outdated Show resolved Hide resolved
// applyFailedRequest handles a failed requests, one that it's not known if it was persisted or not.
func applyFailedRequest(states PossibleStates, request EtcdRequest) PossibleStates {
// stepFailedRequest duplicates number of states by considering request persisted and lost.
func (states nonDeterministicState) stepFailedRequest(request EtcdRequest) nonDeterministicState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want change the receiver's state, but it isn't a pointer receiver. So a function makes more sense than a method here. This comment applies the stepSuccessfulRequest as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful for clarity to have a both models implement Step method. It makes it more natural to understand when compared to have stepDeterministicModel and stepNonDeterministicModel.

Fact that I don't use pointer receiver and return new state is just a design decision that state is non mutable. I think it's ok that methods to be non mutable.

@ahrtr
Copy link
Member

ahrtr commented May 5, 2023

This's a huge PR, and it's hard to carefully review each line of code change. Please try to breakdown such huge PR as possible as you can next time.

I will approve this PR for now given:

  1. It's just test code;
  2. You have deep understanding on the robustness test;
  3. This PR will definitely conflict with any other PRs which update robustness test, and you or others will have to do the boring rebasing.

@ahrtr
Copy link
Member

ahrtr commented May 5, 2023

Before I approve this PR, please other members also take a look. cc @chaochn47 @fuweid @jmhbnz @ptabor if possible.

func applyFailedRequest(states PossibleStates, request EtcdRequest) PossibleStates {
// stepFailedRequest duplicates number of states by considering request persisted and lost.
func (states nonDeterministicState) stepFailedRequest(request EtcdRequest) nonDeterministicState {
newStates := make(nonDeterministicState, 0, len(states)*2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are doubling number of states, is there a scenario that the number will grow very quickly and checker won't be able to handle it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess stepSuccessfulRequest would remove possible states. One successful response should clean up all previous possible states generated back to back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are both right. NonDeterministic model can grow exponentially with number of operations. This however mostly happens on failed/error (wanna guess when it can happen on request success?). As @chaochn47 mentioned successful requests will cleanup previous possible states.

So the problem is that we should avoid long sequences of failed requests as they cause complexity to explode. I have mitigated the problem by introducing history patching #15078 #15598

if !reflect.DeepEqual(newState, s) {
states = append(states, newState)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I see that the check was added in different PR, but can you add a comment why we need !reflect.DeepEqual(newState, s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is just used to avoid same states. Please confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request doesn't impact the state then there is no reason to fork it. Example is failed defrag request.

@@ -295,11 +295,15 @@ func (h *AppendableHistory) AppendDefragment(start, end time.Duration, resp *cli
h.appendFailed(request, start, err)
return
}
var revision int64
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding revision to defragmentResponse is nice to have, not critical for this PR. I that correct? Just trying to understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is based on other code changes due to some shortcuts I took. By separating models I separate EtcdResponse and EtcdNonDeterministicResponse which means that now EtcdResponse needs to have accurate Revision, which means that I need to make sure that defragmentResponse returns response with accurate revision.

@chaochn47
Copy link
Member

LGTM, It's a clean refactor from my point of view. The changed lines look scary though. No regression spotted from my eyes.

tests/robustness/model/base.go Outdated Show resolved Hide resolved
Comment on lines 81 to 83
if response.Txn.TxnResult {
return state
}
Copy link
Member

@chaochn47 chaochn47 May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, could you please explain why the initial state is empty if TxnResult is true?

I understand it is the same as before.

Copy link
Member Author

@serathius serathius May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehm, embarrassing story, apparently the old me decided that TxnResult = !success, because I wanted default Txn struct to reflect transaction success.

Comment on lines 123 to 127
newKVs := map[string]ValueRevision{}
for k, v := range s.KeyValues {
newKVs[k] = v
}
s.KeyValues = newKVs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems redundant between newKVs and s.KeyValues. Could you please clarify what's the usage of newKVs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid modifying the s.KeyValues, I need to copy whole map. Would love to get snapshotable map to clean this up.

case LeaseRevoke:
//Delete the keys attached to the lease
keyDeleted := false
for key := range s.Leases[request.LeaseRevoke.LeaseID].Keys {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, lease in robustness base model has indefinite TTL, correct?

Copy link
Member Author

@serathius serathius May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for now we don't test lease expiration by time, just by revoke call.

}

type TxnResponse struct {
TxnResult bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if you can document the meaning of this field in future PRs. Thanks ~

Copy link
Member Author

@serathius serathius May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix it in following PR, sorry for this abomination in naming.

if !reflect.DeepEqual(newState, s) {
states = append(states, newState)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is just used to avoid same states. Please confirm.

func applyFailedRequest(states PossibleStates, request EtcdRequest) PossibleStates {
// stepFailedRequest duplicates number of states by considering request persisted and lost.
func (states nonDeterministicState) stepFailedRequest(request EtcdRequest) nonDeterministicState {
newStates := make(nonDeterministicState, 0, len(states)*2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess stepSuccessfulRequest would remove possible states. One successful response should clean up all previous possible states generated back to back.

@serathius
Copy link
Member Author

Sorry for large PR, I didn't have time to split it and was worried that it would only make review take longer. As code structure clarifies it should be easier to make small independent changes.

FYI I'm pretty sure that this change is correct as etcd model have absurd number of tests and any error would result would immediately result in robustness tests failing.

Main motivation for this was exponential growth of tests for non-deterministic model. If you look through scenarios I test every combination of requests both their success and failure. Separating deterministic model tests allows me to just implement tests for request success.

…nistic

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
Co-authored-by: Benjamin Wang <wachao@vmware.com>
Co-authored-by: chao <54131596+chaochn47@users.noreply.github.com>
@serathius serathius force-pushed the robustness-non-deterministic branch from 62c69ae to 92366a5 Compare May 5, 2023 10:25
@ahrtr
Copy link
Member

ahrtr commented May 5, 2023

@serathius Please ping me if you resolve all comments, I will take another look so that this PR can merged today.

@serathius
Copy link
Member Author

I resolved all comments, however I didn't mark all of them as "done", as some of them included questions and I want to give people chance to read answers.

@serathius serathius merged commit 79eabc1 into etcd-io:main May 5, 2023
@serathius serathius deleted the robustness-non-deterministic branch June 15, 2023 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants