Implement batch processing for SharePoint certificate lookups and enhance wire offset handling#277
Implement batch processing for SharePoint certificate lookups and enhance wire offset handling#277rhythmatician wants to merge 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a36229447d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except Exception as exc: | ||
| logging.warning("$batch request failed: %s", exc) | ||
| next_pending.extend(chunk) | ||
| continue |
There was a problem hiding this comment.
Propagate exhausted batch lookup errors
If post_batch repeatedly fails (for example during Graph auth/network outages), this except path just re-queues the probes and eventually falls through to the normal return, so callers receive {tn: None} as if files were genuinely missing. In run_pending_wire_offsets, those None values are treated as successful "not found" outcomes, which hides SharePoint failures and can leave wire offsets stale with no error surfaced; after max retries this should raise or otherwise signal failure instead of silently downgrading to not-found.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR adds a Graph $batch-based lookup path for SharePoint certificate searches and wires it into the wire-offset pending processor so certificate resolution can be done up-front (deduped) rather than per-row, reducing Graph round-trips.
Changes:
- Add
batch_find_certificates()to resolve many TNs via Graph/$batchand select the best extension match per TN. - Update wire-roll pending processing to prefetch certificates in bulk and pass pre-resolved docs into workers.
- Extend SharePoint client with
post_batch()and adjust wire-roll tests to patch the new batch lookup.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| utils/sync/wire_rolls/tests/test_pending_wire_offsets.py | Updates unit tests to patch batch_find_certificates and accommodate the new worker call signature. |
| utils/sync/wire_rolls/init.py | Prefetches certificate docs with batch lookup and threads only download/parse work; adds a sentinel to preserve legacy per-row lookup behavior. |
| utils/sync/init.py | Introduces batch_find_certificates() using Graph /$batch, with retry logic for inner 429s. |
| integrations/sharepoint/client.py | Adds SharePointClient.post_batch() and a module-level post_batch() wrapper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
utils/sync/__init__.py
Outdated
| # Other errors (404, 5xx) → treat as not found; don't retry | ||
|
|
There was a problem hiding this comment.
Non-200 statuses other than 429 are silently treated as “not found” with no logging. That can hide configuration/auth/query issues (e.g., 400/401/403/5xx) and lead to false negatives. At minimum, log unexpected statuses (and consider surfacing them to the caller) so operational failures aren’t mistaken for missing certs.
| # Other errors (404, 5xx) → treat as not found; don't retry | |
| else: | |
| # Other errors (404, 4xx, 5xx) → treat as not found; don't retry | |
| logging.warning( | |
| "$batch: unexpected status %s for %s (filename=%s); " | |
| "treating as not found", | |
| status, | |
| tn, | |
| filename, | |
| ) |
| cert_map = batch_find_certificates(valid_tns) if valid_tns else {} | ||
|
|
||
| def _worker(row: dict) -> tuple[SyncState, bool]: | ||
| tn = (row.get("traceability_no") or "").strip() | ||
| return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved)) |
There was a problem hiding this comment.
run_pending_wire_offsets assumes cert_map[tn] is None means “definitively not found”, but batch_find_certificates can also return None after request failures/timeouts. Consider distinguishing “not found” from “lookup failed/unresolved” (e.g., omit TNs that couldn’t be resolved so _unresolved triggers per-row fallback, or have the batch helper raise and fall back to the legacy per-TN search).
| cert_map = batch_find_certificates(valid_tns) if valid_tns else {} | |
| def _worker(row: dict) -> tuple[SyncState, bool]: | |
| tn = (row.get("traceability_no") or "").strip() | |
| return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved)) | |
| try: | |
| cert_map = batch_find_certificates(valid_tns) if valid_tns else {} | |
| except Exception as e: | |
| logging.warning( | |
| "Batch certificate lookup failed; falling back to per-row search: %s", | |
| e, | |
| exc_info=True, | |
| ) | |
| cert_map = {} | |
| def _worker(row: dict) -> tuple[SyncState, bool]: | |
| tn = (row.get("traceability_no") or "").strip() | |
| # If batch lookup explicitly returned None for this TN, treat it as | |
| # "unresolved" so that per-row fallback is attempted rather than | |
| # assuming the certificate is definitively not found. | |
| doc = cert_map.get(tn, _unresolved) | |
| if tn in cert_map and doc is None: | |
| doc = _unresolved | |
| return _process_single_wire_pending(row, doc=doc) |
| @@ -379,11 +393,13 @@ def _process_single_wire_pending(row: dict) -> tuple[SyncState, bool]: | |||
| return state, False | |||
|
|
|||
| try: | |||
| doc = find_certificate(tn) | |||
| if doc is _unresolved: | |||
| # No pre-fetched result — fall back to the original per-TN search. | |||
| doc = find_certificate(tn) | |||
| if not doc: | |||
| state.add_processed(1) | |||
| return state, False | |||
| state += process_wire_certificate_file(doc) | |||
| state += process_wire_certificate_file(doc) # type: ignore[arg-type] # doc is SharePointDocument after sentinel/None guards | |||
| return state, True | |||
There was a problem hiding this comment.
The doc parameter is typed as object and the call to process_wire_certificate_file is suppressed with # type: ignore[arg-type]. To avoid masking real type issues, consider typing doc as SharePointDocument | None | object (or Any) and narrowing/casting after the sentinel/None checks instead of using a blanket ignore.
| def post_batch(self, requests: List[Dict]) -> List[Dict]: | ||
| """POST to the Graph /$batch endpoint and return the 'responses' list. | ||
|
|
||
| Outer 429s are handled by _make_request's retry loop. Individual | ||
| items inside a successful batch response may still carry status=429; | ||
| callers are responsible for detecting and re-queuing those. | ||
| """ | ||
| resp = self._make_request("POST", "/$batch", json={"requests": requests}) |
There was a problem hiding this comment.
In post_batch, the parameter name requests shadows the imported requests module, which can be confusing and makes it easier to introduce mistakes later. Rename the parameter (e.g., batch_requests) in both the method and the functional wrapper.
| def post_batch(self, requests: List[Dict]) -> List[Dict]: | |
| """POST to the Graph /$batch endpoint and return the 'responses' list. | |
| Outer 429s are handled by _make_request's retry loop. Individual | |
| items inside a successful batch response may still carry status=429; | |
| callers are responsible for detecting and re-queuing those. | |
| """ | |
| resp = self._make_request("POST", "/$batch", json={"requests": requests}) | |
| def post_batch(self, batch_requests: List[Dict]) -> List[Dict]: | |
| """POST to the Graph /$batch endpoint and return the 'responses' list. | |
| Outer 429s are handled by _make_request's retry loop. Individual | |
| items inside a successful batch response may still carry status=429; | |
| callers are responsible for detecting and re-queuing those. | |
| """ | |
| resp = self._make_request("POST", "/$batch", json={"requests": batch_requests}) |
| try: | ||
| responses = post_batch(payload) | ||
| except Exception as exc: | ||
| logging.warning("$batch request failed: %s", exc) |
There was a problem hiding this comment.
batch_find_certificates treats repeated $batch failures the same as “not found”: exceptions from post_batch are swallowed and after retries the TN will resolve to None, so callers can silently skip cert processing and record no error. Preserve the previous failure semantics by either re-raising after the retry budget, or returning an explicit “unresolved/error” marker so callers can fall back to per-TN find_certificate (and/or record an error).
| logging.warning("$batch request failed: %s", exc) | |
| logging.warning("$batch request failed (round %d/%d): %s", retry_round + 1, MAX_RETRY_ROUNDS, exc) | |
| # Only retry while we still have remaining rounds; on the final | |
| # round, re-raise so callers see a failure instead of silent None | |
| if retry_round >= MAX_RETRY_ROUNDS - 1: | |
| raise |
| elif status == 429: | ||
| next_pending.append(probe) | ||
| # Other errors (404, 5xx) → treat as not found; don't retry | ||
|
|
||
| if next_pending: | ||
| logging.warning( | ||
| "$batch: %d probes returned inner 429; retrying after 30s " | ||
| "(round %d/%d)", | ||
| len(next_pending), | ||
| retry_round + 1, | ||
| MAX_RETRY_ROUNDS, | ||
| ) | ||
| time.sleep(30) | ||
| pending = next_pending |
There was a problem hiding this comment.
Inner 429 handling uses a fixed 30s sleep, even though Graph batch item responses can include a headers object with Retry-After. Consider honoring per-item Retry-After (and/or using exponential backoff with jitter) to reduce unnecessary delays and to be friendlier to Graph throttling.
…ance wire offset handling
…prove error handling
932d745 to
37babba
Compare
|
Makes things slower, not faster.
|
Introduce batch processing for certificate lookups in SharePoint to reduce the number of requests made. Enhance the handling of wire offsets by allowing multiple certificate searches in a single operation, improving efficiency and performance.