Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kopf Crashes with "Connection broken: IncompleteRead(0 bytes read)" #169

Closed
kopf-archiver bot opened this issue Aug 18, 2020 · 1 comment
Closed
Labels
archive bug Something isn't working

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

An issue by chilicat at 2019-08-06 10:26:44+00:00
Original URL: zalando-incubator/kopf#169
 

Expected Behavior

Simple handler should not crash

Actual Behavior

Kopf crashes after a couple of minutes

`

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 639, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 397, in _error_catcher
yield
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 704, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 643, in _update_chunk_length
raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 750, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 527, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 732, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 415, in _error_catcher
raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/kopf", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kopf/cli.py", line 30, in wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/kopf/cli.py", line 61, in run
peering_name=peering_name,
File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 275, in run
_reraise(loop, list(done1) + list(done2) + list(done3) + list(done4))
File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 303, in _reraise
task.result() # can raise the regular (non-cancellation) exceptions.
File "/usr/local/lib/python3.7/site-packages/kopf/reactor/queueing.py", line 81, in watcher
async for event in watching.infinite_watch(resource=resource, namespace=namespace):
File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 131, in infinite_watch
async for event in streaming_watch(resource=resource, namespace=namespace):
File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 93, in streaming_watch
async for event in streaming_aiter(stream, loop=loop):
File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 62, in streaming_aiter
yield await loop.run_in_executor(executor, streaming_next, src)
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/kopf/clients/watching.py", line 50, in streaming_next
return next(src)
File "/usr/local/lib/python3.7/site-packages/kopf/clients/fetching.py", line 87, in
return iter({'type': event.type, 'object': event.object.obj} for event in src)
File "/usr/local/lib/python3.7/site-packages/pykube/query.py", line 214, in object_stream
for line in r.iter_lines():
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 794, in iter_lines
for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 753, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

`

Steps to Reproduce the Problem

Start Kopf with a following handler:

`
import kopf
import yaml
import os

@kopf.on.event('', 'v1', 'pods', labels= {"type": "mongod"})
def pod_changed(logger, body, **kwargs):
logger.info(f"Pod: %s", body['metadata']['name'])
pass
`
Kopf crashes in around 5 minutes

Specifications

  • Platform:
    Docker container: python:3.7.3-alpine3.9

  • Kubernetes version: (use kubectl version)

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:26:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

  • Python version: (use python --version)

Python 3.7.3

  • Python packages installed: (use pip freeze --all)
    aiohttp==3.5.4
    aiojobs==0.2.2
    async-timeout==3.0.1
    attrs==19.1.0
    cachetools==3.1.1
    certifi==2019.6.16
    chardet==3.0.4
    Click==7.0
    google-auth==1.6.3
    idna==2.8
    iso8601==0.1.12
    kopf==0.20
    kubernetes==10.0.0
    multidict==4.5.2
    oauthlib==3.0.2
    pip==19.1.1
    pyasn1==0.4.6
    pyasn1-modules==0.2.6
    pykube-ng==0.28
    python-dateutil==2.8.0
    PyYAML==5.1.2
    requests==2.22.0
    requests-oauthlib==1.2.0
    rsa==4.0
    setuptools==41.0.1
    six==1.12.0
    urllib3==1.25.3
    websocket-client==0.56.0
    wheel==0.33.3
    yarl==1.3.0

Commented by nolar at 2019-08-06 10:46:43+00:00
 

chilicat Thanks for reporting.

Can you please make an experiment in your environment: if you put this line on top of your script, does the delayed error happen exactly by that specified time (in seconds)? If you set it to 600 seconds (10 mins), does it still happen at ~5 mins?

import kopf

kopf.config.WatchersConfig.default_stream_timeout = 60

@kopf.on.event(...)
...

Commented by chilicat at 2019-08-06 12:21:52+00:00
 

The process does not crash anymore (~50 minutes, still running)


Commented by nolar at 2019-08-06 13:45:28+00:00
 

chilicat Thanks. So, let it be a workaround for now (despite that kopf.config... is undocumented and internal). Please, wrap it with try-except — in case this module/class/attribute is renamed/removed in the future.

I saw this issue few times — with sporadic server-side disconnections when ?timeout=... query arg is not specified. It goes deep into K8s API implementation and Python's internals: kopf→pykube→requests→urlib3→http→socket.

I would prefer to not fix the sync i/o issues in this async app anymore (too many, too hard), and would better replace all of this with aiohttp as the core of Kopf's i/o (coming soon) — and then fix the connection issues there (if they happen).

So, let's keep this issue open until then — so that the issue is not forgotten, and a fix is added.


Commented by chilicat at 2019-08-06 13:57:54+00:00
 

nolar Sure, no problem. Thanks for the fast feedback.


Commented by jaceksan at 2019-10-06 19:14:31+00:00
 

I am experiencing the same issue with my PoC of operator for Vertica cluster DB.
Luckily the workaround works for me as well.
Subscribing.


Commented by jaceksan at 2019-10-08 06:52:04+00:00
 

nolar while playing with my operator, the error started to occur more and more often, it is almost impossible to continue developing it.
Unfortunately the workaround stopped working.
Is delivery of the replacement with aiohttp already planned?
Is there any other way, how to workaround the issue?


Commented by nolar at 2019-11-13 17:44:35+00:00
 

kopf==0.23rc1 is now pre-released (see the release notes). It is now fully aiohttp-based, and contains no synchronous API calls. Which means, the whole I/O machinery is changed. Which means, the described issue is either completely gone, or will look differently.

chilicat jaceksan Please, give this release candidate a try — is the reported issue gone (with a workaround removed temporarily)? I could not reproduce it in any of my environments.

@kopf-archiver kopf-archiver bot closed this as completed Aug 18, 2020
@kopf-archiver kopf-archiver bot changed the title [archival placeholder] Kopf Crashes with "Connection broken: IncompleteRead(0 bytes read)" Aug 19, 2020
@kopf-archiver kopf-archiver bot added the bug Something isn't working label Aug 19, 2020
@kopf-archiver kopf-archiver bot reopened this Aug 19, 2020
@nolar
Copy link
Owner

nolar commented Feb 6, 2021

Based on the last comment, I assume this issue is solved with a switch from pykube-ng+requests to aiohttp.

@nolar nolar closed this as completed Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant