-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to quickly reproduce issue 14370 #14912
Conversation
Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
a34f61e
to
46ee82d
Compare
.PHONY: reproduce-issue14370 | ||
reproduce-issue14370: ./bin/etcd-v3.5.4-failpoints | ||
cp ./bin/etcd-v3.5.4-failpoints ./bin/etcd | ||
GO_TEST_FLAGS='-v --run=TestLinearizability/Issue14370 --count 100 --failfast' make test-linearizability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ad 1. It will work as both gofail binary and libraries are from same version taken from main branch GOFAIL_VERSION
.
Ad 2. I can agree that we shouldn't put everything into root scripts, however I'm against adding this as a script. Makefile ensures that we keep code simple, which we will loose when we migrate everything to bash scripts. See 700 line ./scripts/tests.sh
.
I think we our makefile is still pretty simple and readable. Splitting things too early is also a bad pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ad 1. It will work as both gofail binary and libraries are from same version taken from main branch GOFAIL_VERSION
Sooner or later, eventually we will bump new gofail version, and update all the existing failpoint names (e.g. beforeCommit
--> backendBeforeCommit
), and also update linearizability test (e.g. backend/beforeCommit
--> backendBeforeCommit
). Will the script in this PR be still working at that time?
FYI. I just created a tag v0.1.0 for gofail: https://github.com/etcd-io/gofail/releases/tag/v0.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Benjamin:
- Makefile shouldn't cary project's history. It should contains up-to-date receipts to work with the current code.
We should answer what's the purpose of the code:
a) Just documentation - historical note how the problem was reproductionable in 3.5.4 ?
b) Potential regression test - that can be applied to up-to-date sources to make sure the similar scenario does not happens (or even can be used with bisect).
I prefer the latter.
I'm thinking about putting this code to /tests/repros
that would lead to:
- In
/tests/repros/Issue14370.sh
it would make sense to code only i--local
mode to cover the current code. - We would need to make a new-tag
v3.5.4-repro
with cherry picked/tests/repros/Issue14370.sh
to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)
Testing any revision since v3.5.4-repro forward is as simple as:
git checkout `rev`
/tests/repros/Issue14370.sh
And this mitigates risks at changes at HEAD to the new toolkit will break the script working for the older revisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purpose of this code is to ensure that during development of linearizability testing, we don't loose ability to reproduce historical issues. We are testing 1to1 same scenarios on main branch, however due to nature of failure injection testing it's hard to guarantee those scenarios will maintain ability to reproduce those issues. Imagine that we have a issue that can be only reproduce under high volume of PUT requests, if during development of linearizability tests we change request time proportions we might no longer be able to reproduce this.
I use those scripts as a sanity checks during linearizability test development. Unfortunately we cannot automatically test this as issues don't have 100% reproducability as they are based on race condition. I hope that in future improvements to gofail will allow us to do that by using sleeps to always reproduce a race, but we are far from it.
In /tests/repros/Issue14370.sh it would make sense to code only i--local mode to cover the current code.
Those scenarios are already running on main branch. No need for local mode.
We would need to make a new-tag v3.5.4-repro with cherry picked /tests/repros/Issue14370.sh to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)
Really don't like idea of maintaining large number of repro only tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those scenarios are already running on main branch. No need for local mode.
Yes - I wrote local to stress it doesn't have a code to to git checkout
of another branch in tmp
dir
We would need to make a new-tag v3.5.4-repro with cherry picked /tests/repros/Issue14370.sh to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)
Really don't like idea of maintaining large number of repro only tags.
I hope this will be not a large number. It's worth doing only for big issues.
Thinking what's the reliable path to replay on 3.5.4 when the main will be 3.7.x and making sure all the scripts from 3.7.x work well both: on 3.5.4 and on 3.7.4 ?
And if we don't want the repro tag... the only reliable alternative I see is:
git checkout v3.5.5 (the first version with the repro code)
./tests/repros/repro_at.sh Issue14370.sh -ref=v3.5.4
Where ./tests/repros/repro_at.sh
have on orchestration of checkout of ref
but finally calls Issues14370.sh from current branch on the tmp
checked out code.
cd /tmp/etcd-release-v3.5.4/; \ | ||
git clone https://github.com/etcd-io/etcd.git .; \ | ||
git checkout v3.5.4; \ | ||
go get go.etcd.io/gofail@${GOFAIL_VERSION}; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that lines L145-L149,L151 are generic code that could have it's 'makefile'
it's
make build-etcd-failpoints
.PHONY: reproduce-issue14370 | ||
reproduce-issue14370: ./bin/etcd-v3.5.4-failpoints | ||
cp ./bin/etcd-v3.5.4-failpoints ./bin/etcd | ||
GO_TEST_FLAGS='-v --run=TestLinearizability/Issue14370 --count 100 --failfast' make test-linearizability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Benjamin:
- Makefile shouldn't cary project's history. It should contains up-to-date receipts to work with the current code.
We should answer what's the purpose of the code:
a) Just documentation - historical note how the problem was reproductionable in 3.5.4 ?
b) Potential regression test - that can be applied to up-to-date sources to make sure the similar scenario does not happens (or even can be used with bisect).
I prefer the latter.
I'm thinking about putting this code to /tests/repros
that would lead to:
- In
/tests/repros/Issue14370.sh
it would make sense to code only i--local
mode to cover the current code. - We would need to make a new-tag
v3.5.4-repro
with cherry picked/tests/repros/Issue14370.sh
to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)
Testing any revision since v3.5.4-repro forward is as simple as:
git checkout `rev`
/tests/repros/Issue14370.sh
And this mitigates risks at changes at HEAD to the new toolkit will break the script working for the older revisions.
To give more context on this PR. I use the As those are purely linearizability scripts, maybe putting them into |
This is super useful when iterating on linearizability tests to be able to verify that we reproduce old issues.
cc @ahrtr @ptabor