Skip to content

Implement batch processing for SharePoint certificate lookups and enhance wire offset handling#277

Closed
rhythmatician wants to merge 2 commits intomainfrom
batch-sharepoint-calls
Closed

Implement batch processing for SharePoint certificate lookups and enhance wire offset handling#277
rhythmatician wants to merge 2 commits intomainfrom
batch-sharepoint-calls

Conversation

@rhythmatician
Copy link
Contributor

Introduce batch processing for certificate lookups in SharePoint to reduce the number of requests made. Enhance the handling of wire offsets by allowing multiple certificate searches in a single operation, improving efficiency and performance.

Copilot AI review requested due to automatic review settings February 25, 2026 21:40
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a36229447d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +124 to +127
except Exception as exc:
logging.warning("$batch request failed: %s", exc)
next_pending.extend(chunk)
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate exhausted batch lookup errors

If post_batch repeatedly fails (for example during Graph auth/network outages), this except path just re-queues the probes and eventually falls through to the normal return, so callers receive {tn: None} as if files were genuinely missing. In run_pending_wire_offsets, those None values are treated as successful "not found" outcomes, which hides SharePoint failures and can leave wire offsets stale with no error surfaced; after max retries this should raise or otherwise signal failure instead of silently downgrading to not-found.

Useful? React with 👍 / 👎.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Graph $batch-based lookup path for SharePoint certificate searches and wires it into the wire-offset pending processor so certificate resolution can be done up-front (deduped) rather than per-row, reducing Graph round-trips.

Changes:

  • Add batch_find_certificates() to resolve many TNs via Graph /$batch and select the best extension match per TN.
  • Update wire-roll pending processing to prefetch certificates in bulk and pass pre-resolved docs into workers.
  • Extend SharePoint client with post_batch() and adjust wire-roll tests to patch the new batch lookup.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
utils/sync/wire_rolls/tests/test_pending_wire_offsets.py Updates unit tests to patch batch_find_certificates and accommodate the new worker call signature.
utils/sync/wire_rolls/init.py Prefetches certificate docs with batch lookup and threads only download/parse work; adds a sentinel to preserve legacy per-row lookup behavior.
utils/sync/init.py Introduces batch_find_certificates() using Graph /$batch, with retry logic for inner 429s.
integrations/sharepoint/client.py Adds SharePointClient.post_batch() and a module-level post_batch() wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 155 to 156
# Other errors (404, 5xx) → treat as not found; don't retry

Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-200 statuses other than 429 are silently treated as “not found” with no logging. That can hide configuration/auth/query issues (e.g., 400/401/403/5xx) and lead to false negatives. At minimum, log unexpected statuses (and consider surfacing them to the caller) so operational failures aren’t mistaken for missing certs.

Suggested change
# Other errors (404, 5xx) → treat as not found; don't retry
else:
# Other errors (404, 4xx, 5xx) → treat as not found; don't retry
logging.warning(
"$batch: unexpected status %s for %s (filename=%s); "
"treating as not found",
status,
tn,
filename,
)

Copilot uses AI. Check for mistakes.
Comment on lines +444 to +448
cert_map = batch_find_certificates(valid_tns) if valid_tns else {}

def _worker(row: dict) -> tuple[SyncState, bool]:
tn = (row.get("traceability_no") or "").strip()
return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved))
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_pending_wire_offsets assumes cert_map[tn] is None means “definitively not found”, but batch_find_certificates can also return None after request failures/timeouts. Consider distinguishing “not found” from “lookup failed/unresolved” (e.g., omit TNs that couldn’t be resolved so _unresolved triggers per-row fallback, or have the batch helper raise and fall back to the legacy per-TN search).

Suggested change
cert_map = batch_find_certificates(valid_tns) if valid_tns else {}
def _worker(row: dict) -> tuple[SyncState, bool]:
tn = (row.get("traceability_no") or "").strip()
return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved))
try:
cert_map = batch_find_certificates(valid_tns) if valid_tns else {}
except Exception as e:
logging.warning(
"Batch certificate lookup failed; falling back to per-row search: %s",
e,
exc_info=True,
)
cert_map = {}
def _worker(row: dict) -> tuple[SyncState, bool]:
tn = (row.get("traceability_no") or "").strip()
# If batch lookup explicitly returned None for this TN, treat it as
# "unresolved" so that per-row fallback is attempted rather than
# assuming the certificate is definitively not found.
doc = cert_map.get(tn, _unresolved)
if tn in cert_map and doc is None:
doc = _unresolved
return _process_single_wire_pending(row, doc=doc)

Copilot uses AI. Check for mistakes.
Comment on lines 371 to 403
@@ -379,11 +393,13 @@ def _process_single_wire_pending(row: dict) -> tuple[SyncState, bool]:
return state, False

try:
doc = find_certificate(tn)
if doc is _unresolved:
# No pre-fetched result — fall back to the original per-TN search.
doc = find_certificate(tn)
if not doc:
state.add_processed(1)
return state, False
state += process_wire_certificate_file(doc)
state += process_wire_certificate_file(doc) # type: ignore[arg-type] # doc is SharePointDocument after sentinel/None guards
return state, True
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc parameter is typed as object and the call to process_wire_certificate_file is suppressed with # type: ignore[arg-type]. To avoid masking real type issues, consider typing doc as SharePointDocument | None | object (or Any) and narrowing/casting after the sentinel/None checks instead of using a blanket ignore.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +170
def post_batch(self, requests: List[Dict]) -> List[Dict]:
"""POST to the Graph /$batch endpoint and return the 'responses' list.

Outer 429s are handled by _make_request's retry loop. Individual
items inside a successful batch response may still carry status=429;
callers are responsible for detecting and re-queuing those.
"""
resp = self._make_request("POST", "/$batch", json={"requests": requests})
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In post_batch, the parameter name requests shadows the imported requests module, which can be confusing and makes it easier to introduce mistakes later. Rename the parameter (e.g., batch_requests) in both the method and the functional wrapper.

Suggested change
def post_batch(self, requests: List[Dict]) -> List[Dict]:
"""POST to the Graph /$batch endpoint and return the 'responses' list.
Outer 429s are handled by _make_request's retry loop. Individual
items inside a successful batch response may still carry status=429;
callers are responsible for detecting and re-queuing those.
"""
resp = self._make_request("POST", "/$batch", json={"requests": requests})
def post_batch(self, batch_requests: List[Dict]) -> List[Dict]:
"""POST to the Graph /$batch endpoint and return the 'responses' list.
Outer 429s are handled by _make_request's retry loop. Individual
items inside a successful batch response may still carry status=429;
callers are responsible for detecting and re-queuing those.
"""
resp = self._make_request("POST", "/$batch", json={"requests": batch_requests})

Copilot uses AI. Check for mistakes.
try:
responses = post_batch(payload)
except Exception as exc:
logging.warning("$batch request failed: %s", exc)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_find_certificates treats repeated $batch failures the same as “not found”: exceptions from post_batch are swallowed and after retries the TN will resolve to None, so callers can silently skip cert processing and record no error. Preserve the previous failure semantics by either re-raising after the retry budget, or returning an explicit “unresolved/error” marker so callers can fall back to per-TN find_certificate (and/or record an error).

Suggested change
logging.warning("$batch request failed: %s", exc)
logging.warning("$batch request failed (round %d/%d): %s", retry_round + 1, MAX_RETRY_ROUNDS, exc)
# Only retry while we still have remaining rounds; on the final
# round, re-raise so callers see a failure instead of silent None
if retry_round >= MAX_RETRY_ROUNDS - 1:
raise

Copilot uses AI. Check for mistakes.
Comment on lines 153 to 166
elif status == 429:
next_pending.append(probe)
# Other errors (404, 5xx) → treat as not found; don't retry

if next_pending:
logging.warning(
"$batch: %d probes returned inner 429; retrying after 30s "
"(round %d/%d)",
len(next_pending),
retry_round + 1,
MAX_RETRY_ROUNDS,
)
time.sleep(30)
pending = next_pending
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inner 429 handling uses a fixed 30s sleep, even though Graph batch item responses can include a headers object with Retry-After. Consider honoring per-item Retry-After (and/or using exponential backoff with jitter) to reduce unnecessary delays and to be friendlier to Graph throttling.

Copilot uses AI. Check for mistakes.
@rhythmatician rhythmatician marked this pull request as draft February 25, 2026 22:18
@rhythmatician
Copy link
Contributor Author

Makes things slower, not faster.

hash sync mode duration (s) 429 count branch
37babba incremental 102.416 1 batch-sharepoint-calls
37babba full 311.182 90 batch-sharepoint-calls
a84b8f4 incremental 96.643 1 main
a84b8f4 incremental 95.58 1 main
a84b8f4 incremental 105.47 1 main
a84b8f4 full 204.25 2 main

@rhythmatician rhythmatician deleted the batch-sharepoint-calls branch February 25, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants