Event Aggregation #117

kopf-archiver · 2020-08-18T19:56:38Z

An issue by mzizzi at 2019-06-16 15:07:05+00:00
Original URL: zalando-incubator/kopf#117

First off, thanks for a getting this framework together. I've enjoyed hacking around :-)

This might be more of a request for the python kube client as it appears to be lacking event aggregation functionality similar to that found in the go client.

Occasionally an operator may get stuck in a retry loop. If many handlers are failing with retryable errors then a large number of events will be generated putting stress on etcd and making the output of kubectl get events very hard to work with.

Expected Behavior

Duplicate or "near duplicate" events are aggregated.

$ oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME         KIND      SUBOBJECT   TYPE      REASON              SOURCE    MESSAGE
20m        20m         1234         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']

Actual Behavior

Every event generated by kopf is a new event in kubernetes which, if I understand correctly, puts undue load on etcd.

$ oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME         KIND      SUBOBJECT   TYPE      REASON              SOURCE    MESSAGE
20m        20m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
21m        21m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
20m        20m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
20m        20m         0         my-custom-resource   MyCustomResource               Error     HandlerRetryError   kopf      Handler 'on_delete' failed. Will retry. ['errors']
...

Steps to Reproduce the Problem

Write any handler that gets stuck in a retry loop and observe kubectl get events.

Install any CRD into the cluster using a handler like the one below. (In this case you'd have to invoke the
handler by creating a new "MyCustomResource")

@kopf.on.create('foo.bar', 'v1', 'my-custom-resources')
def on_create(**kwargs):
    raise HandlerRetryError(['errors'], delay=1)

Specifications

Platform: minishift

Kubernetes version:

oc version
oc v3.7.23
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.42.153:8443
kubernetes v1.11.0+d4cacc0

Python version:
```
Python 3.7.3
```

Python packages installed:

pip freeze --all
aiohttp==3.5.4
aiojobs==0.2.2
async-timeout==3.0.1
attrs==19.1.0
cachetools==3.1.1
certifi==2019.3.9
chardet==3.0.4
Click==7.0
datadog==0.29.3
decorator==4.4.0
entrypoints==0.3
flake8==3.7.7
google-auth==1.6.3
idna==2.8
iso8601==0.1.12
kopf==0.14
kubernetes==9.0.0
mccabe==0.6.1
multidict==4.5.2
oauthlib==3.0.1
pip==19.0.3
pyasn1==0.4.5
pyasn1-modules==0.2.5
pycodestyle==2.5.0
pyflakes==2.1.1
python-dateutil==2.8.0
PyYAML==5.1.1
requests==2.22.0
requests-oauthlib==1.2.0
rsa==4.0
setuptools==40.8.0
six==1.12.0
urllib3==1.25.3
websocket-client==0.56.0
yarl==1.3.0

Commented by nolar at 2019-06-19 14:54:47+00:00

mzizzi Do you mean the in-memory event accumulation, aggregation, and then posting only the aggregated events every few seconds/minutes/events?

Or is this also about the event patching with "lastTimestamp", "count", and some other field updates? — Which implies one API request per event anyway, just PATCH rather that POST, but will make kubectl get events output shorter.

Commented by mzizzi at 2019-06-19 18:22:15+00:00

nolar Good question. I hadn't made the distinction when I originally posted the question.

After reading more into how the go client works.. It uses a combination of rate-limiting, in memory caching, and event patching. That solves both potential issues that you highlighted:

Load introduced by many POST/PATCH requests for events
Load due to excessive amounts of events being stored in kube

Incorporating some (or all!) of these features will help us create well-behaved Operators.

The text was updated successfully, but these errors were encountered:

kopf-archiver bot added the archive label Aug 18, 2020

kopf-archiver bot closed this as completed Aug 18, 2020

kopf-archiver bot changed the title ~~[archival placeholder]~~ Event Aggregation Aug 19, 2020

kopf-archiver bot added the enhancement New feature or request label Aug 19, 2020

kopf-archiver bot reopened this Aug 19, 2020

This was referenced Aug 19, 2020

[PR] Move the modules around to cleanup the code #124

Closed

[PR] Post k8s-events in the background #125

Closed

Setting kopf.EventsConfig.events_loglevel doesn't prevent Type=Normal Events to be posted #188

Closed

nolar mentioned this issue Aug 20, 2020

Retry k8s-event posting if failed #513

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event Aggregation #117

Event Aggregation #117

kopf-archiver bot commented Aug 18, 2020 •

edited

Loading

Event Aggregation #117

Event Aggregation #117

Comments

kopf-archiver bot commented Aug 18, 2020 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

kopf-archiver bot commented Aug 18, 2020 •

edited

Loading