Lower throughput while having more and more watchers #19064
Open
Description
Bug report criteria
- This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
- This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- You have read the etcd bug reporting guidelines.
- Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
I am doing an ETCD throughput benchmark. I observed a throughput drop while having more and more watchers.
How I conduct my benchmark
- I use a fixed number of separate clients, each of which keeps sending a simple Txn in Kubernetes-like optimistic creation.
- In the meantime, I also launched a fixed number of watchers to watch the prefix I used to create KV.
- And I also compact every 10 seconds.
Key length ~10 bytes, value length 1301 bytes
The full code can be found here: https://gist.github.com/jokerwyt/b29b5113d0a5f75f6d5621d05d627230
Here is my result.
watcher\conc 60 80 100 120 140
0 26765.07 27658.51 27951.77 27953.14 27954.7
1 18788.5 18431.04 16221.12 11767.03 15444.2
2 13639.76 14557.84 12761.36 12464.89 14349.55
3 13157.18 13431.09 11564.61 12073.8 13138.41
4 12520.72 10658.89 12019.56 10515.21 10127.3
5 11439.27 10491.39 12060.64 10877.8 10575.94
6 13070.41 10405.48 9658.23 11835.03 10982.19
7 12127.91 12062.77 10176.37 9965.35 10284.55
8 13128.63 11080.99 10346.09 10189.54 10012.19
9 9548.81 10232.87 9440.67 11225.33 9655.85
10 9449.93 9440.84 9908.77 9808.65 9530.57
What did you expect to happen?
I expect etcd has the same performance while having 0 or more watchers.
How can we reproduce it (as minimally and precisely as possible)?
I have a test script, use this combining the go benchmark code.
But you may need to set up an etcd yourself and do some small modifications to the script.
https://gist.github.com/jokerwyt/955a810bfe28b342f6ace11ba840e36c
Anything else we need to know?
No response
Etcd version (please run commands below)
3.5.10
Etcd configuration (command line flags or environment variables)
quota-backend-bytes: "8589934592" # 8Gi
auto-compaction-retention: "120m"
auto-compaction-mode: "periodic"
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
ytwu@worker1:~$
ETCDCTL_API=3 etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://worker1:2379 member list -w table
+------------------+---------+---------+------------------------+------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+---------+------------------------+------------------------+
| 99b1b6bcd47e918c | started | worker1 | https://10.10.1.4:2380 | https://10.10.1.4:2379 |
+------------------+---------+---------+------------------------+------------------------+
$ etcdctl --endpoints=<member list> endpoint status -w table
ytwu@worker1:~$
ETCDCTL_API=3 etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://worker1:2379 endpoint status -w table
+----------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------+------------------+---------+---------+-----------+-----------+------------+
| https://worker1:2379 | 99b1b6bcd47e918c | 3.5.10 | 1.0 GB | true | 2 | 500011 |
+----------------------+------------------+---------+---------+-----------+-----------+------------+
Relevant log output
No response