Add script to quickly reproduce issue 14370 #14912

serathius · 2022-12-07T22:28:17Z

This is super useful when iterating on linearizability tests to be able to verify that we reproduce old issues.

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

ahrtr · 2022-12-09T12:48:09Z

Makefile

+.PHONY: reproduce-issue14370
+reproduce-issue14370: ./bin/etcd-v3.5.4-failpoints
+	cp ./bin/etcd-v3.5.4-failpoints ./bin/etcd
+	GO_TEST_FLAGS='-v --run=TestLinearizability/Issue14370 --count 100 --failfast' make test-linearizability


Two comments:

After we remove the package from the failpoint name, this will not work any more. Because you always checkout & build v3.5.4.

Makefile seems not like a good place to just demonstrate the ability of linearizability test. Can we just add a script file under ./script/ directory? @ptabor @spzala WDYT?

Ad 1. It will work as both gofail binary and libraries are from same version taken from main branch GOFAIL_VERSION.

Ad 2. I can agree that we shouldn't put everything into root scripts, however I'm against adding this as a script. Makefile ensures that we keep code simple, which we will loose when we migrate everything to bash scripts. See 700 line ./scripts/tests.sh.

I think we our makefile is still pretty simple and readable. Splitting things too early is also a bad pattern.

Ad 1. It will work as both gofail binary and libraries are from same version taken from main branch GOFAIL_VERSION

Sooner or later, eventually we will bump new gofail version, and update all the existing failpoint names (e.g. beforeCommit --> backendBeforeCommit), and also update linearizability test (e.g. backend/beforeCommit --> backendBeforeCommit). Will the script in this PR be still working at that time?

FYI. I just created a tag v0.1.0 for gofail: https://github.com/etcd-io/gofail/releases/tag/v0.1.0

I agree with Benjamin:
- Makefile shouldn't cary project's history. It should contains up-to-date receipts to work with the current code.

We should answer what's the purpose of the code:
a) Just documentation - historical note how the problem was reproductionable in 3.5.4 ?
b) Potential regression test - that can be applied to up-to-date sources to make sure the similar scenario does not happens (or even can be used with bisect).

I prefer the latter.

I'm thinking about putting this code to /tests/repros that would lead to:

In /tests/repros/Issue14370.sh it would make sense to code only i--local mode to cover the current code.

We would need to make a new-tag v3.5.4-repro with cherry picked /tests/repros/Issue14370.sh to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)

Testing any revision since v3.5.4-repro forward is as simple as:

git checkout `rev` /tests/repros/Issue14370.sh

And this mitigates risks at changes at HEAD to the new toolkit will break the script working for the older revisions.

Purpose of this code is to ensure that during development of linearizability testing, we don't loose ability to reproduce historical issues. We are testing 1to1 same scenarios on main branch, however due to nature of failure injection testing it's hard to guarantee those scenarios will maintain ability to reproduce those issues. Imagine that we have a issue that can be only reproduce under high volume of PUT requests, if during development of linearizability tests we change request time proportions we might no longer be able to reproduce this.

I use those scripts as a sanity checks during linearizability test development. Unfortunately we cannot automatically test this as issues don't have 100% reproducability as they are based on race condition. I hope that in future improvements to gofail will allow us to do that by using sleeps to always reproduce a race, but we are far from it.

In /tests/repros/Issue14370.sh it would make sense to code only i--local mode to cover the current code.

Those scenarios are already running on main branch. No need for local mode.

We would need to make a new-tag v3.5.4-repro with cherry picked /tests/repros/Issue14370.sh to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)

Really don't like idea of maintaining large number of repro only tags.

Those scenarios are already running on main branch. No need for local mode.

Yes - I wrote local to stress it doesn't have a code to to git checkout of another branch in tmp dir

We would need to make a new-tag v3.5.4-repro with cherry picked /tests/repros/Issue14370.sh to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)

Really don't like idea of maintaining large number of repro only tags.

I hope this will be not a large number. It's worth doing only for big issues.

Thinking what's the reliable path to replay on 3.5.4 when the main will be 3.7.x and making sure all the scripts from 3.7.x work well both: on 3.5.4 and on 3.7.4 ?
And if we don't want the repro tag... the only reliable alternative I see is:

git checkout v3.5.5 (the first version with the repro code) ./tests/repros/repro_at.sh Issue14370.sh -ref=v3.5.4

Where ./tests/repros/repro_at.sh have on orchestration of checkout of ref but finally calls Issues14370.sh from current branch on the tmp checked out code.

ptabor · 2022-12-12T11:38:25Z

Makefile

+	cd /tmp/etcd-release-v3.5.4/; \
+	  git clone https://github.com/etcd-io/etcd.git .; \
+	  git checkout v3.5.4; \
+	  go get go.etcd.io/gofail@${GOFAIL_VERSION}; \


I think that lines L145-L149,L151 are generic code that could have it's 'makefile'

it's

make build-etcd-failpoints

ptabor · 2022-12-12T11:55:59Z

Makefile

+.PHONY: reproduce-issue14370
+reproduce-issue14370: ./bin/etcd-v3.5.4-failpoints
+	cp ./bin/etcd-v3.5.4-failpoints ./bin/etcd
+	GO_TEST_FLAGS='-v --run=TestLinearizability/Issue14370 --count 100 --failfast' make test-linearizability


I agree with Benjamin:
- Makefile shouldn't cary project's history. It should contains up-to-date receipts to work with the current code.

We should answer what's the purpose of the code:
a) Just documentation - historical note how the problem was reproductionable in 3.5.4 ?
b) Potential regression test - that can be applied to up-to-date sources to make sure the similar scenario does not happens (or even can be used with bisect).

I prefer the latter.

I'm thinking about putting this code to /tests/repros that would lead to:

In /tests/repros/Issue14370.sh it would make sense to code only i--local mode to cover the current code.

We would need to make a new-tag v3.5.4-repro with cherry picked /tests/repros/Issue14370.sh to have a historical repro with all the toolkit compatible with v3.5.4 (assuming toolkit longterm will change and we want to keep it working)

Testing any revision since v3.5.4-repro forward is as simple as:

git checkout `rev` /tests/repros/Issue14370.sh

And this mitigates risks at changes at HEAD to the new toolkit will break the script working for the older revisions.

serathius · 2022-12-12T12:57:21Z

To give more context on this PR. I use the repro-* scripts locally to ensure that latest changes linearizability don't break our ability to reproduce old issues. Don't think those scripts are critical, but I would like to avoid situation where those scripts are only present on my machine and our knowledge how to reproduce old issues is lost when I'm not around anymore.

As those are purely linearizability scripts, maybe putting them into tests/linearizability/Makefile would prevent dirtying the root Makefile?

Add script to quickly reproduce issue 14370

46ee82d

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>

serathius force-pushed the linearizability-issue14370 branch from a34f61e to 46ee82d Compare December 7, 2022 22:35

ahrtr reviewed Dec 9, 2022

View reviewed changes

ahrtr mentioned this pull request Dec 9, 2022

tests: Reproduce issue 14685 #14909

Closed

ptabor reviewed Dec 12, 2022

View reviewed changes

serathius closed this Jan 16, 2023

serathius deleted the linearizability-issue14370 branch June 15, 2023 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script to quickly reproduce issue 14370 #14912

Add script to quickly reproduce issue 14370 #14912

serathius commented Dec 7, 2022

ahrtr Dec 9, 2022

serathius Dec 9, 2022

ahrtr Dec 9, 2022

ptabor Dec 12, 2022

serathius Dec 12, 2022

ptabor Dec 12, 2022

ptabor Dec 12, 2022

ptabor Dec 12, 2022

serathius commented Dec 12, 2022

Add script to quickly reproduce issue 14370 #14912

Add script to quickly reproduce issue 14370 #14912

Conversation

serathius commented Dec 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius commented Dec 12, 2022