move the version check from leader campaign to startup #7978
Closed
Description
Enhancement Task
This is the propose to move the version check from leader campaign to startup
we have a case when we lost the leader.
- the pd binary is built without version tag
- upgrade the 3 pd node to the wrong pd build
- after the upgrade completes, all the 3 pd nodes are into the crash loop during compaign leader, and the cluster lost the pd leader and no longer function.
Here is the pd log for panic during leader campaign
{"level":"INFO","time":"2024/03/26 00:12:16.368 +00:00","caller":"versioninfo.go:89","message":"Welcome to Placement Driver (PD)"}
{"level":"INFO","time":"2024/03/26 00:12:16.368 +00:00","caller":"versioninfo.go:90","message":"PD","release-version":"62227fb4c"}
...
{"level":"INFO","time":"2024/03/26 00:43:45.706 +00:00","caller":"server.go:1670","message":"campaign PD leader ok","campaign-leader-name":"pd-1"}
{"level":"FATAL","time":"2024/03/26 00:43:46.950 +00:00","caller":"versioninfo.go:61","message":"version string is illegal","error":"[PD:semver:ErrSemverNewVersion]62227fb4c is not in dotted-tri format: 62227fb4c is not in dotted-tri format","errorVerbose":"[PD:semver:E
rrSemverNewVersion]62227fb4c is not in dotted-tri format: 62227fb4c is not in dotted-tri format\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithSt
ackByCause\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:307\ngithub.com/tikv/pd/pkg/versioninfo.ParseVersion\n\t/mnt/tidb/pd/pkg/versioninfo/versioninfo.go:52\ngithub.com/tikv/pd/pkg/versioninfo.MustParseVersion\n\t/mnt/tid
b/pd/pkg/versioninfo/versioninfo.go:59\ngithub.com/tikv/pd/server.CheckPDVersion\n\t/mnt/tidb/pd/server/util.go:40\ngithub.com/tikv/pd/server.(*Server).campaignLeader\n\t/mnt/tidb/pd/server/server.go:1743\ngithub.com/tikv/pd/server.(*Server).leaderLoop\n\t/mnt/tidb/pd/s
erver/server.go:1639\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650","stack":"github.com/tikv/pd/pkg/versioninfo.MustParseVersion\n\t/mnt/tidb/pd/pkg/versioninfo/versioninfo.go:61\ngithub.com/tikv/pd/server.CheckPDVersion\n\t/mnt/tidb/pd/server/util.go:40
\ngithub.com/tikv/pd/server.(*Server).campaignLeader\n\t/mnt/tidb/pd/server/server.go:1743\ngithub.com/tikv/pd/server.(*Server).leaderLoop\n\t/mnt/tidb/pd/server/server.go:1639"}
{"level":"WARN","time":"2024/03/26 00:43:51.984 +00:00","caller":"member.go:250","message":"the pd leader has not changed, delete and campaign again","old-pd-leader":"name:\"pd-1\" member_id:1438954984562261702 peer_urls:\"http
s://infra-tidb-pd-shopping-catalog-prod-0a019086.ec2.pin220.com:2380\" client_urls:\"https://infra-tidb-pd-shopping-catalog-prod-0a019086.ec2.pin220.com:2379\" "}
{"level":"INFO","time":"2024/03/26 00:43:51.986 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:43:52.187 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:43:52.388 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:43:58.012 +00:00","caller":"server.go:1607","message":"pd leader has changed, try to re-campaign a pd leader"}
{"level":"INFO","time":"2024/03/26 00:43:58.012 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:44:04.477 +00:00","caller":"server.go:1607","message":"pd leader has changed, try to re-campaign a pd leader"}
{"level":"INFO","time":"2024/03/26 00:44:04.477 +00:00","caller":"server.go:1644","message":"start to campaign PD leader","campaign-leader-name":"pd-1"}
{"level":"INFO","time":"2024/03/26 00:44:04.481 +00:00","caller":"leadership.go:181","message":"check campaign resp","resp":{"header":{"cluster_id":8850434198915930927,"member_id":16443876602637797343,"revision":55937640,"raft_term":342},"succeeded":true,"responses":[{"
Response":{"ResponsePut":{"header":{"revision":55937640}}}}]}}
{"level":"INFO","time":"2024/03/26 00:44:04.481 +00:00","caller":"server.go:1670","message":"campaign PD leader ok","campaign-leader-name":"pd-1"}