Implement batch processing for SharePoint certificate lookups and enhance wire offset handling by rhythmatician · Pull Request #277 · Johnson-Gage-Inspection-Inc/core-api

rhythmatician · 2026-02-25T21:40:08Z

Introduce batch processing for certificate lookups in SharePoint to reduce the number of requests made. Enhance the handling of wire offsets by allowing multiple certificate searches in a single operation, improving efficiency and performance.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a36229447d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-25T21:44:56Z

utils/sync/__init__.py

+            except Exception as exc:
+                logging.warning("$batch request failed: %s", exc)
+                next_pending.extend(chunk)
+                continue


Propagate exhausted batch lookup errors

If post_batch repeatedly fails (for example during Graph auth/network outages), this except path just re-queues the probes and eventually falls through to the normal return, so callers receive {tn: None} as if files were genuinely missing. In run_pending_wire_offsets, those None values are treated as successful "not found" outcomes, which hides SharePoint failures and can leave wire offsets stale with no error surfaced; after max retries this should raise or otherwise signal failure instead of silently downgrading to not-found.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR adds a Graph $batch-based lookup path for SharePoint certificate searches and wires it into the wire-offset pending processor so certificate resolution can be done up-front (deduped) rather than per-row, reducing Graph round-trips.

Changes:

Add batch_find_certificates() to resolve many TNs via Graph /$batch and select the best extension match per TN.
Update wire-roll pending processing to prefetch certificates in bulk and pass pre-resolved docs into workers.
Extend SharePoint client with post_batch() and adjust wire-roll tests to patch the new batch lookup.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
utils/sync/wire_rolls/tests/test_pending_wire_offsets.py	Updates unit tests to patch `batch_find_certificates` and accommodate the new worker call signature.
utils/sync/wire_rolls/init.py	Prefetches certificate docs with batch lookup and threads only download/parse work; adds a sentinel to preserve legacy per-row lookup behavior.
utils/sync/init.py	Introduces `batch_find_certificates()` using Graph `/$batch`, with retry logic for inner 429s.
integrations/sharepoint/client.py	Adds `SharePointClient.post_batch()` and a module-level `post_batch()` wrapper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T21:45:12Z

utils/sync/__init__.py

+                # Other errors (404, 5xx) → treat as not found; don't retry
+


Non-200 statuses other than 429 are silently treated as “not found” with no logging. That can hide configuration/auth/query issues (e.g., 400/401/403/5xx) and lead to false negatives. At minimum, log unexpected statuses (and consider surfacing them to the caller) so operational failures aren’t mistaken for missing certs.

Suggested change

# Other errors (404, 5xx) → treat as not found; don't retry

else:

# Other errors (404, 4xx, 5xx) → treat as not found; don't retry

logging.warning(

"$batch: unexpected status %s for %s (filename=%s); "

"treating as not found",

status,

tn,

filename,

)

Copilot · 2026-02-25T21:45:13Z

utils/sync/wire_rolls/__init__.py

+    cert_map = batch_find_certificates(valid_tns) if valid_tns else {}
+
+    def _worker(row: dict) -> tuple[SyncState, bool]:
+        tn = (row.get("traceability_no") or "").strip()
+        return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved))


run_pending_wire_offsets assumes cert_map[tn] is None means “definitively not found”, but batch_find_certificates can also return None after request failures/timeouts. Consider distinguishing “not found” from “lookup failed/unresolved” (e.g., omit TNs that couldn’t be resolved so _unresolved triggers per-row fallback, or have the batch helper raise and fall back to the legacy per-TN search).

Suggested change

cert_map = batch_find_certificates(valid_tns) if valid_tns else {}

def _worker(row: dict) -> tuple[SyncState, bool]:

tn = (row.get("traceability_no") or "").strip()

return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved))

try:

cert_map = batch_find_certificates(valid_tns) if valid_tns else {}

except Exception as e:

logging.warning(

"Batch certificate lookup failed; falling back to per-row search: %s",

e,

exc_info=True,

)

cert_map = {}

def _worker(row: dict) -> tuple[SyncState, bool]:

tn = (row.get("traceability_no") or "").strip()

# If batch lookup explicitly returned None for this TN, treat it as

# "unresolved" so that per-row fallback is attempted rather than

# assuming the certificate is definitively not found.

doc = cert_map.get(tn, _unresolved)

if tn in cert_map and doc is None:

doc = _unresolved

return _process_single_wire_pending(row, doc=doc)

Copilot · 2026-02-25T21:45:13Z

utils/sync/wire_rolls/__init__.py

@@ -379,11 +393,13 @@ def _process_single_wire_pending(row: dict) -> tuple[SyncState, bool]:
        return state, False

    try:
-        doc = find_certificate(tn)
+        if doc is _unresolved:
+            # No pre-fetched result — fall back to the original per-TN search.
+            doc = find_certificate(tn)
        if not doc:
            state.add_processed(1)
            return state, False
-        state += process_wire_certificate_file(doc)
+        state += process_wire_certificate_file(doc)  # type: ignore[arg-type]  # doc is SharePointDocument after sentinel/None guards
        return state, True


The doc parameter is typed as object and the call to process_wire_certificate_file is suppressed with # type: ignore[arg-type]. To avoid masking real type issues, consider typing doc as SharePointDocument | None | object (or Any) and narrowing/casting after the sentinel/None checks instead of using a blanket ignore.

Copilot · 2026-02-25T21:45:13Z

integrations/sharepoint/client.py

+    def post_batch(self, requests: List[Dict]) -> List[Dict]:
+        """POST to the Graph /$batch endpoint and return the 'responses' list.
+
+        Outer 429s are handled by _make_request's retry loop.  Individual
+        items inside a successful batch response may still carry status=429;
+        callers are responsible for detecting and re-queuing those.
+        """
+        resp = self._make_request("POST", "/$batch", json={"requests": requests})


In post_batch, the parameter name requests shadows the imported requests module, which can be confusing and makes it easier to introduce mistakes later. Rename the parameter (e.g., batch_requests) in both the method and the functional wrapper.

Suggested change

def post_batch(self, requests: List[Dict]) -> List[Dict]:

"""POST to the Graph /$batch endpoint and return the 'responses' list.

Outer 429s are handled by _make_request's retry loop. Individual

items inside a successful batch response may still carry status=429;

callers are responsible for detecting and re-queuing those.

"""

resp = self._make_request("POST", "/$batch", json={"requests": requests})

def post_batch(self, batch_requests: List[Dict]) -> List[Dict]:

"""POST to the Graph /$batch endpoint and return the 'responses' list.

Outer 429s are handled by _make_request's retry loop. Individual

items inside a successful batch response may still carry status=429;

callers are responsible for detecting and re-queuing those.

"""

resp = self._make_request("POST", "/$batch", json={"requests": batch_requests})

Copilot · 2026-02-25T21:45:13Z

utils/sync/__init__.py

+            try:
+                responses = post_batch(payload)
+            except Exception as exc:
+                logging.warning("$batch request failed: %s", exc)


batch_find_certificates treats repeated $batch failures the same as “not found”: exceptions from post_batch are swallowed and after retries the TN will resolve to None, so callers can silently skip cert processing and record no error. Preserve the previous failure semantics by either re-raising after the retry budget, or returning an explicit “unresolved/error” marker so callers can fall back to per-TN find_certificate (and/or record an error).

Suggested change

logging.warning("$batch request failed: %s", exc)

logging.warning("$batch request failed (round %d/%d): %s", retry_round + 1, MAX_RETRY_ROUNDS, exc)

# Only retry while we still have remaining rounds; on the final

# round, re-raise so callers see a failure instead of silent None

if retry_round >= MAX_RETRY_ROUNDS - 1:

raise

Copilot · 2026-02-25T21:45:14Z

utils/sync/__init__.py

+                elif status == 429:
+                    next_pending.append(probe)
+                # Other errors (404, 5xx) → treat as not found; don't retry
+
+        if next_pending:
+            logging.warning(
+                "$batch: %d probes returned inner 429; retrying after 30s "
+                "(round %d/%d)",
+                len(next_pending),
+                retry_round + 1,
+                MAX_RETRY_ROUNDS,
+            )
+            time.sleep(30)
+            pending = next_pending


Inner 429 handling uses a fixed 30s sleep, even though Graph batch item responses can include a headers object with Retry-After. Consider honoring per-item Retry-After (and/or using exponential backoff with jitter) to reduce unnecessary delays and to be friendlier to Graph throttling.

…ance wire offset handling

…prove error handling

rhythmatician · 2026-02-25T22:57:32Z

Makes things slower, not faster.

hash	sync mode	duration (s)	429 count	branch
`37babba`	incremental	102.416	1	batch-sharepoint-calls
`37babba`	full	311.182	90	batch-sharepoint-calls
`a84b8f4`	incremental	96.643	1	main
`a84b8f4`	incremental	95.58	1	main
`a84b8f4`	incremental	105.47	1	main
`a84b8f4`	full	204.25	2	main

Copilot AI review requested due to automatic review settings February 25, 2026 21:40

Copilot started reviewing on behalf of rhythmatician February 25, 2026 21:40 View session

chatgpt-codex-connector bot reviewed Feb 25, 2026

View reviewed changes

Copilot AI reviewed Feb 25, 2026

View reviewed changes

rhythmatician marked this pull request as draft February 25, 2026 22:18

rhythmatician added 2 commits February 25, 2026 16:54

Implement batch processing for SharePoint certificate lookups and enh…

3ce8d71

…ance wire offset handling

Optimize batch_find_certificates to reduce SharePoint requests and im…

37babba

…prove error handling

rhythmatician force-pushed the batch-sharepoint-calls branch from 932d745 to 37babba Compare February 25, 2026 22:54

rhythmatician closed this Feb 25, 2026

rhythmatician deleted the batch-sharepoint-calls branch February 25, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement batch processing for SharePoint certificate lookups and enhance wire offset handling#277

Implement batch processing for SharePoint certificate lookups and enhance wire offset handling#277
rhythmatician wants to merge 2 commits intomainfrom
batch-sharepoint-calls

rhythmatician commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

rhythmatician commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Other errors (404, 5xx) → treat as not found; don't retry

-                # Other errors (404, 5xx) → treat as not found; don't retry
+                else:
+                    # Other errors (404, 4xx, 5xx) → treat as not found; don't retry
+                    logging.warning(
+                        "$batch: unexpected status %s for %s (filename=%s); "
+                        "treating as not found",
+                        status,
+                        tn,
+                        filename,
+                    )

-    cert_map = batch_find_certificates(valid_tns) if valid_tns else {}
-    def _worker(row: dict) -> tuple[SyncState, bool]:
-        tn = (row.get("traceability_no") or "").strip()
-        return _process_single_wire_pending(row, doc=cert_map.get(tn, _unresolved))
+    try:
+        cert_map = batch_find_certificates(valid_tns) if valid_tns else {}
+    except Exception as e:
+        logging.warning(
+            "Batch certificate lookup failed; falling back to per-row search: %s",
+            e,
+            exc_info=True,
+        )
+        cert_map = {}
+    def _worker(row: dict) -> tuple[SyncState, bool]:
+        tn = (row.get("traceability_no") or "").strip()
+        # If batch lookup explicitly returned None for this TN, treat it as
+        # "unresolved" so that per-row fallback is attempted rather than
+        # assuming the certificate is definitively not found.
+        doc = cert_map.get(tn, _unresolved)
+        if tn in cert_map and doc is None:
+            doc = _unresolved
+        return _process_single_wire_pending(row, doc=doc)

-                logging.warning("$batch request failed: %s", exc)
+                logging.warning("$batch request failed (round %d/%d): %s", retry_round + 1, MAX_RETRY_ROUNDS, exc)
+                # Only retry while we still have remaining rounds; on the final
+                # round, re-raise so callers see a failure instead of silent None
+                if retry_round >= MAX_RETRY_ROUNDS - 1:
+                    raise

Conversation

rhythmatician commented Feb 25, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

rhythmatician commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants