Skip to content

Conversation

@jacksonlee-civis
Copy link
Member

This pull request fixes a few issues related to retries:

  • There have been recent reports of a recursion error due to nesting "wait" strategies, when making Civis API calls under civis.APIClient. As a solution, instead of a single tenacity.Retrying instance for all API calls for a given civis.APIClient instance, each API call now has its own tenacity.Retrying instance. Since tenacity.Retrying is stateful (for attempt counts, etc.), this solution would also resolve yet-to-be-reported issues where retries would only work correctly for the first API calls but not subsequent ones for a given civis.APIClient instance.
  • Before this PR, the wait strategy was "use the retry-after seconds if known, else use the user-specified amount". When the amount of retry-after seconds is less than that from the user-specified strategy, the wait time would be less than what the user (or our default retryer) would require, which is not ideal. I've modified the code for "between retry-after and the user-specified one, wait at least the retry-after amount".
  • civis/io/_files.py had its own retrying strategy for a few helper functions. I've cleaned things up to use the same default retryer over these functions instead, and have written new unit tests for their retrying behavior.
  • Elsewhere in this codebase, there were other standalone HTTP calls via requests not yet covered by any retries. I've wrapped them up with the default retryer.

  • (For Civis employees only) Reference to a relevant ticket in the pull request title
  • Changelog entry added to CHANGELOG.md at the repo's root level
  • Description of change in the pull request description
  • If applicable, unit tests have been added and/or updated
  • The CircleCI builds have all passed

# Store the current buffer position in case we need to retry below.
buf_orig_position = buf.tell()

@_retry(RETRY_EXCEPTIONS)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to the default retryer. New unit test written as test_file_single_upload_retries below for this helper function.

raise ValueError(f"There are {num_parts} file parts but only {len(urls)} urls")

# upload function wrapped with a retry decorator
@_retry(RETRY_EXCEPTIONS)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to the default retryer. New unit test written astest_file_multipart_upload_retries below for this helper function.

# Store the current buffer position in case we need to retry below.
buf_orig_position = buf.tell()

@_retry(RETRY_EXCEPTIONS)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to the default retryer. The test suite has already had the unit test test_civis_to_file_retries for this helper function.

import os


def get_api_key(api_key):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new code. Moved this function here from _utils.py (now renamed _retries.py for clarity and not having a cluttered "utilities" module around).


# If retries are exhausted,
# raise the last exception encountered, not tenacity's RetryError.
retrying.reraise = True
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for civis.APIClient has also been updated to alert the end user about reraise being true if they provide their own retryer.

if retry_conditions:
retrying.retry = retry_conditions
retrying.wait = wait_for_retry_after_header(fallback=retrying.wait)
retrying.wait = wait_at_least_retry_after_header(base=retrying.wait)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "fallback" idea doesn't seem right. I've modified code to "wait at least rety-after seconds, compared to the specified wait strategy" instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This client.pyi is for type hints and is auto-generated/modified when civis_api_spec.json is updated (we run python tools/update_civis_api_spec.py locally to update these files together).

Comment on lines +242 to +252
@pytest.mark.parametrize(
"retrying, max_calls",
[
# Simulate a user-provided tenacity.Retrying with limited max attempts.
(_get_retrying(3), 3),
# No user-provided tenacity.Retrying. Use civis-python's default retrying.
(None, 10),
],
)
@mock.patch("civis.futures.time.sleep", side_effect=lambda x: None)
def test_no_recursion_in_retry_request(m_sleep, retrying, max_calls):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New unit test based on #524. I've modified the test code to test for both cases of a user-specified retryer and the default retryer.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes several issues related to retries in the Civis Python client by refactoring how retry mechanisms work and standardizing retry usage across the codebase.

  • Fixed recursion error by creating new tenacity.Retrying instances for each API call instead of reusing a single stateful instance
  • Improved wait strategy to wait at least the Retry-After header duration rather than potentially less
  • Consolidated retry logic by removing custom retry implementation in _files.py and using the default retryer throughout

Reviewed Changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_retries.py Added test to verify recursion is avoided with new retry instances
tests/test_io.py Added tests for retry behavior in file upload functions
src/civis/resources/_resources.py Updated imports to use new retry module structure
src/civis/resources/_client_pyi.py Added type hints for tenacity import and retries parameter
src/civis/io/_tables.py Added retry wrapper for HTTP requests
src/civis/io/_files.py Removed custom retry decorator and used default retryer
src/civis/client.pyi Updated type annotations and API documentation
src/civis/client.py Added type validation and documentation for retries parameter
src/civis/cli/_cli_commands.py Added retry wrapper for notebook download requests
src/civis/cli/main.py Updated import path for retry functions
src/civis/base.py Updated to use new retry instance per request
src/civis/_retries.py Major refactor of retry logic with new instance creation and improved wait strategy
src/civis/_get_api_key.py Extracted API key functionality into separate module
.circleci/config.yml Added security vulnerability ignore for pip-audit

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@mattyb mattyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for getting this fix in

@jacksonlee-civis jacksonlee-civis merged commit d829708 into main Oct 31, 2025
2 checks passed
@jacksonlee-civis jacksonlee-civis deleted the civis-11753-new-retrying-instance-for-each-api-call branch October 31, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants