Skip to content

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Oct 10, 2025

Add requests.exceptions.ChunkedEncodingError to the list of default exceptions to retry

The situation was reported by @andimarafioti when streaming a parquet dataset from HF with range requests in NanoVLM

Exception in thread Thread-2 (_producer):
Traceback (most recent call last):
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/urllib3/response.py", line 779, in _error_catcher
    yield
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/urllib3/response.py", line 925, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(22154112 bytes read, 34162245 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/urllib3/response.py", line 1091, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/urllib3/response.py", line 1008, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/urllib3/response.py", line 903, in _raw_read
    with self._error_catcher():
         ^^^^^^^^^^^^^^^^^^^^^
  File "/admin/home/andres_marafioti/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/urllib3/response.py", line 803, in _error_catcher
    raise ProtocolError(arg, e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(22154112 bytes read, 34162245 more expected)', IncompleteRead(22154112 bytes read, 34162245 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/admin/home/andres_marafioti/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/admin/home/andres_marafioti/.local/share/uv/python/cpython-3.12.10-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/fsx/andi/nanoVLM/data/advanced_datasets.py", line 115, in _producer
    sample = next(iterator)
             ^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/data/datasets.py", line 109, in iter_for_worker
    for data in self.dataset:
                ^^^^^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 2422, in __iter__
    yield from self._iter_pytorch()
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 2337, in _iter_pytorch
    for key, example in ex_iterable:
                        ^^^^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 1957, in __iter__
    for key, pa_table in self._iter_arrow():
                         ^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 1980, in _iter_arrow
    for key, pa_table in self.ex_iterable._iter_arrow():
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 508, in _iter_arrow
    for key, pa_table in iterator:
                         ^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 155, in _convert_to_arrow
    for key, example in iterator:
                        ^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 1698, in __iter__
    yield from islice(self.ex_iterable, ex_iterable_idx_start, None)
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 868, in __iter__
    yield from ex_iterable
  File "/fsx/andi/datasets_github/datasets/src/datasets/iterable_dataset.py", line 330, in __iter__
    for key, pa_table in self.generate_tables_fn(**gen_kwags):
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/datasets_github/datasets/src/datasets/packaged_modules/parquet/parquet.py", line 93, in _generate_tables
    for batch_idx, record_batch in enumerate(
                                   ^^^^^^^^^^
  File "pyarrow/_dataset.pyx", line 3904, in _iterator
  File "pyarrow/_dataset.pyx", line 3494, in pyarrow._dataset.TaggedRecordBatchIterator.__next__
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
  File "/fsx/andi/datasets_github/datasets/src/datasets/utils/file_utils.py", line 813, in read_with_retries
    out = read(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/huggingface_hub/hf_file_system.py", line 997, in read
    return super().read(length)
           ^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/fsspec/spec.py", line 2111, in read
    out = self.cache._fetch(self.loc, self.loc + length)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/fsspec/caching.py", line 287, in _fetch
    self.cache = self.fetcher(start, end)  # new block replaces old
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/huggingface_hub/hf_file_system.py", line 957, in _fetch_range
    r = http_backoff("GET", url, headers=headers, timeout=constants.HF_HUB_DOWNLOAD_TIMEOUT)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_http.py", line 305, in http_backoff
    response = session.request(method=method, url=url, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/sessions.py", line 724, in send
    history = [resp for resp in gen]
                                ^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/sessions.py", line 265, in resolve_redirects
    resp = self.send(
           ^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/sessions.py", line 746, in send
    r.content
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/models.py", line 902, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/andi/nanoVLM/.venv/lib/python3.12/site-packages/requests/models.py", line 822, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(22154112 bytes read, 34162245 more expected)', IncompleteRead(22154112 bytes read, 34162245 more expected))

@lhoestq lhoestq requested a review from Wauplin October 10, 2025 16:28
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@andimarafioti andimarafioti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Quentin! this seems like it would fix the error I had :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants