Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sleep failpoint status/counter for linearizability test cases #37

Closed
wants to merge 2 commits into from

Conversation

ramil600
Copy link
Contributor

Each time sleep is executed in actSleep counter is incremented. Added counter endpoint that gives number of sleeps executed on the current terms: GET pkg/failpoint/count When DELETE /pkg/failpoint is executed, failpoint.terms is nilled, hence counter is reset.

Requested in reference to:
tests/linearizability: added sleep fail point injection #14796

Signed-off-by: Ramil Mirhasanov ramil600@yahoo.com

@ramil600
Copy link
Contributor Author

cc @serathius

runtime/http.go Outdated
@@ -72,6 +72,13 @@ func (*httpHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
lines[i] = fps[i] + "=" + s
}
w.Write([]byte(strings.Join(lines, "\n") + "\n"))
} else if strings.HasSuffix(key, "/count") {
fp := key[:len(key)-6]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fp := key[:len(key)-6]
fp := key[:len(key)-len("/count")]

@@ -114,6 +114,25 @@ func Status(failpath string) (string, error) {
return t.desc, nil
}

// StatusCount gives the number of sleeps executed on the current terms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// StatusCount gives the number of sleeps executed on the current terms
// StatusCount gives the how many times current term was executed

runtime/terms.go Outdated
@@ -41,6 +41,8 @@ type terms struct {

// mu protects the state of the terms chain
mu sync.Mutex
// counts amount of sleep executions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// counts amount of sleep executions
// counts amount of executions

runtime/terms.go Outdated
@@ -310,6 +312,9 @@ func actSleep(t *term) interface{} {
return nil
}
time.Sleep(dur)
t.parent.mu.Lock()
t.parent.counter++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we count number of any executions instead of just sleep?

@serathius
Copy link
Member

Overall looks good, however let's count any execution instead of only sleep ones.

@ramil600
Copy link
Contributor Author

@serathius, implemented fixes as per your suggestions.
I added a test that follows terms and executes failpoints accordingly, the only problem is that it is failing linter check, because I am using gofail syntax. Please advise how should I go about that. Do I need to build a script like for linearizability tests?

@@ -72,6 +72,13 @@ func (*httpHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
lines[i] = fps[i] + "=" + s
}
w.Write([]byte(strings.Join(lines, "\n") + "\n"))
} else if strings.HasSuffix(key, "/count") {
Copy link
Member

@serathius serathius Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API could be improved to something more RESTy, but this should work for now.

@@ -114,6 +114,25 @@ func Status(failpath string) (string, error) {
return t.desc, nil
}

// StatusCount outputs how many times current term was executed
func StatusCount(failpath string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that StatusCount function should return an int instead of string. However that can be fixed in future PRs.

name: "runtime_test/ExampleString",
desc: `10*sleep(10)->1*return("abc")`,
run: 12,
want: "11",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run loop 12 times, how the result is 11?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this happens because we are building terms here by saying:
desc: 10sleep(10)->1return("abc"),
sleep exactly 10 times and then return 1 time.
Even if the function is called more than 11 times, no terms will be executed, because its controlled by internal counter:
func (mc *modCount) allow() bool { if mc.c > 0 { mc.c-- return true } return false }
That way we can check if the fail point is executed exactly 11 times and not the number of function calls.

want string
}{
{
name: "runtime_test/ExampleString",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more tests scenarios? For example: Before enable, after disable, after changing term etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

},
}

for _, tc := range testcases {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use table tests, please use subtests.

@serathius
Copy link
Member

serathius commented Nov 21, 2022

In this case I think it's ok to disable the lint rule on those lines by adding a comment nolint:stylecheck. For example

var __fp_ExampleString *runtime.Failpoint = runtime.NewFailpoint("runtime_test", "ExampleString") //nolint:stylecheck

@@ -41,6 +41,8 @@ type terms struct {

// mu protects the state of the terms chain
mu sync.Mutex
// counts executions
counter int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this field?

@ahrtr
Copy link
Member

ahrtr commented Nov 21, 2022

Note: when you implement any new feature or do any enhancement, please clarify the requirement first. I do not get the info from the description.

Please also make sure you read design.md. Note that the existing design & implementation already supports counter. For example,

2*sleep("10s")  # It means the sleep will be executed twice
2*return("abc")  # it means the return will be executed twice

@serathius
Copy link
Member

Note: when you implement any new feature or do any enhancement, please clarify the requirement first. I do not get the info from the description.

Requirements were discussed in etcd-io/etcd#14796, unfortunatelly issue was not correctly linked.

@ahrtr
Copy link
Member

ahrtr commented Nov 21, 2022

If you really need more visibility of the terms, I would suggest to enhance the existing Status function. Currently it only returns the description. We can define a struct or format, and get more info included.

ahrtr and others added 2 commits November 22, 2022 09:23
`gofail --version` should exit with 0
Each time terms is executed the counter is incremented. Added counter endpoint that gives number of executions on the current terms: GET pkg/failpoint/count
When DELETE /pkg/failpoint is executed, failpoint.terms is nilled, hence counter is reset.

Requested in reference to:
tests/linearizability: added sleep fail point injection #14796

Signed-off-by: Ramil Mirhasanov <ramil600@yahoo.com>

runtime: added test cases for terms counter
@ramil600
Copy link
Contributor Author

implemented more testcases, please let me know if you would like me to implement any changes to the http endpoints

@lavacat
Copy link

lavacat commented Jan 10, 2023

@ramil600 are you still working on this PR? No rush, but please reply, otherwise I'll take over in couple days.

@ramil600
Copy link
Contributor Author

@lavacat, sorry for late reply, I was working on it, but there was no resolution from maintainers, on whether my PR should be merged. If you can help me, I am willing complete this one, please review my submitted PR.

@lavacat
Copy link

lavacat commented Jan 12, 2023

@ramil600 I've looked at the PR.

  1. Per @ahrtr suggestion we should use runtime.Status
  2. Maybe we should just count evals and not actual term executions. Different terms in term chain can be executed different number of times. Seams confusing. But I'm open to keeping it as is.
  3. Please update PR description, your original one is about sleep. I think it can be worded better.
  4. There are some comments from @serathius and @ahrtr that aren't resolved. Please ask to clarify or resolve.

@ahrtr
Copy link
Member

ahrtr commented Jan 12, 2023

  1. Per @ahrtr suggestion we should use runtime.Status

Yes, please feel free to follow this direction to work on this task. But please keep it backward compatible, which means it shouldn't break anything on the existing etcd's linearizablity test.

2. Different terms in term chain can be executed different number of times. Seams confusing. But I'm open to keeping it as is.

The existing logic on term isn't perfect, but it meets the requirement of the etcd test for now. We can enhance it on demand. I thought about refactoring& enhancing the overall design (e.g. adding AND, OR logic to control the execution of term chain), but I gave up because it is neither a priority for now, nor we receive such request so far. Anyway, we can revisit this when there is a real use case.

@ramil600
Copy link
Contributor Author

Thank you guys, @ahrtr, @lavacat, if this is not high priority, I will look into it and get back within few days.

@ramil600 ramil600 changed the title runtime: added counter to failpoint.terms field Implement sleep failpoint status/counter for linearizability test cases Jan 16, 2023
@ramil600
Copy link
Contributor Author

ramil600 commented Jan 16, 2023

If you really need more visibility of the terms, I would suggest to enhance the existing Status function. Currently it only returns the description. We can define a struct or format, and get more info included.

In order to keep the function signature same
Could we return remaining terms description for sleep fail point, instead of description?
Let's say if sleep failpoint was already executed five times out of ten, we could return:
5*sleep(10)->1*return("abc")
if counter is reset after all the executions we can also leave it but mark it as 0:
0*sleep(10)->1*return("abc")

@ahrtr, please let me know what do you think about it?

Currently I added StatusCount() function in the same PR, that counts executions with help of new terms.counter variable.
That is the alternative implementation, if you are ok with me adding this new function?

@lavacat
Copy link

lavacat commented May 8, 2023

Could we return remaining terms description for sleep fail point, instead of description?

I think it will break logic for checking if failpoint is Available
Also, can be confusing for new users.

IMO, we should define Status output as json and then refactor etcd fetchFailpoints

Since this is blocking sleep failpoints impl I'm ok with merging current /count. We can discuss format and do the refactor later. Etcd community owns both repos and gofail used mostly by etcd, so probably we can drop /count without backward compatibility. CC @ahrtr @serathius

@ramil600 please also address points 3 and 4 in #37 (comment)

@lavacat
Copy link

lavacat commented May 13, 2023

@ramil600 are you still working on this? If you won't reply by May 15th, I'll assume you are busy and I'll look for somebody to take over. You'll still be the author of the commit. IMO this is very close to done if we stick with /count.

@pchan
Copy link

pchan commented May 18, 2023

@ramil600 are you still working on this? If you won't reply by May 15th, I'll assume you are busy and I'll look for somebody to take over. You'll still be the author of the commit. IMO this is very close to done if we stick with /count.

I have attempted to update Ramil's code to run successfully in #47. If Ramil wants to take it up he can continue or I can start fixing issues listed in #37 (comment). @ramil600 is still the author of the original commit.

@serathius
Copy link
Member

Closing for #47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants