Add sync workflows and validation scripts #1

chitcommit · 2025-12-04T21:19:50Z

Adds hardened sync workflows for governance, evidence, operations, and property bundles. Includes Python preflight and post-export verification scripts.\n\nWhat’s included:\n- Four reusable workflows: sync-*-bundle\n- Preflight validator: scripts/preflight.py\n- Bundle verification: scripts/verify_bundle.py\n- Executable bits and .gitignore\n\nTesting done:\n- Ran preflight to confirm token/placeholder checks trigger as expected\n\nFollow-ups:\n- Set NOTION_TOKEN secret in repo or org\n- Replace placeholder Notion database URLs\n- Wire scheduled triggers if desired

coderabbitai · 2025-12-04T21:20:00Z

Walkthrough

Introduces automated daily bundle synchronization infrastructure with four GitHub Actions workflows—Evidence, Governance, Operations, and Property—each triggering on schedule and manual dispatch. Includes supporting Python validation scripts for preflight checks and post-export bundle verification, plus project-wide gitignore patterns.

Changes

Cohort / File(s)	Summary
Bundle Sync Workflows `.github/workflows/sync-evidence.yml`, `.github/workflows/sync-governance.yml`, `.github/workflows/sync-operations.yml`, `.github/workflows/sync-property.yml`	Four GitHub Actions workflows automating daily bundle export and sync at staggered UTC times (2:00, 3:00, 4:00, 5:00). Each workflow: checks out code, sets up Python 3.11, runs preflight validation, performs dry-run and full export, verifies bundle, detects changes, commits/pushes updates, and publishes step summary with manifest content. Uses CHITTY_ID for tracking and includes structured commit metadata.
Validation Scripts `scripts/preflight.py`, `scripts/verify_bundle.py`	New Python utilities for bundle management. `preflight.py` validates YAML config, Notion token, bundle configuration (names, schema versions, database URLs), and output directory writability. `verify_bundle.py` post-export verification: checks manifest existence/validity, CSV file integrity, schema version compatibility, and ChittyID compliance in audit sections.
Project Configuration `.gitignore`	Standard ignore patterns for Python artifacts, virtual environments, IDE folders, OS files, Notion exports, secrets, and logs.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Script as Python Scripts
    participant Git as Git Repository
    participant Notion as Notion API
    participant Verify as Verification

    GHA->>Script: Run preflight checks
    Script->>Script: Validate config & token
    Script-->>GHA: ✓ Preflight result
    
    GHA->>Script: Dry-run export
    Script->>Notion: Query databases
    Notion-->>Script: Return data
    Script-->>GHA: ✓ Dry-run validated
    
    GHA->>Script: Full export
    Script->>Notion: Query databases
    Notion-->>Script: Return data
    Script->>Git: Write bundles & manifest
    Script-->>GHA: ✓ Export complete
    
    GHA->>Verify: Verify bundle
    Verify->>Git: Read manifest & CSVs
    Verify->>Verify: Validate integrity
    Verify-->>GHA: ✓ Verification passed
    
    alt Changes detected
        GHA->>Git: Commit with metadata
        Git-->>GHA: ✓ Committed
        GHA->>Git: Push to repository
        Git-->>GHA: ✓ Pushed
    end
    
    GHA->>GHA: Publish summary artifact

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Workflow consistency: Verify all four workflows follow the same pattern and handle edge cases uniformly (preflight skipping, change detection, commit logic)
Error handling in Python scripts: Confirm proper exit codes, error messages to stderr, and graceful handling of missing config/YAML/files in preflight.py and verify_bundle.py
Git change detection logic: Validate that git diff --name-only correctly identifies changes in bundle directories
Manifest parsing: Ensure robust JSON parsing and optional field handling across workflows

Poem

🐰 Four workflows hop each day, syncing bundles on their way,
Preflight checks and verify—no surprises, we certify!
From Notion's gardens to Git's deep store, hop hop automation's core! 🌿✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main changes: adding four new GitHub Actions workflows for syncing different registries and two new validation scripts (preflight.py and verify_bundle.py).

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/sync-workflows-and-validators

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-04T21:28:33Z

.github/workflows/sync-evidence.yml

+        run: |
+          echo "Running dry-run to validate export..."
+          python scripts/export_bundles.py evidence --dry-run


Add missing export_bundles.py script

The sync workflow invokes python scripts/export_bundles.py evidence --dry-run/python scripts/export_bundles.py evidence, but the repository does not contain scripts/export_bundles.py (checked with find .. -name "export_bundles.py" returning no results). Every run of this job (and the governance/operations/property variants that make the same call) will fail before exporting data because the script cannot be found.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

.github/workflows/sync-evidence.yml (1)

39-45: Align diff scope and preflight behavior with sync-property recommendations

This workflow mirrors sync-property with:

if: ${{ !inputs.skip_preflight }} for the preflight step.

Change detection limited to packages/evidence/data/ while committing packages/evidence/ as a whole.

The same considerations apply:

Consider guarding skip_preflight by github.event_name for clarity.

Consider diffing packages/evidence/ so manifest-only updates (e.g., packages/evidence/manifest.json) also trigger a commit, if your export script touches them.

See the comment on .github/workflows/sync-property.yml for a concrete diff that can be adapted here.

Also applies to: 65-85

.github/workflows/sync-governance.yml (1)

39-45: Reuse the same diff and preflight guards as other sync workflows

sync-governance has the same structure as the other bundle sync jobs:

Optional preflight gated by if: ${{ !inputs.skip_preflight }}.

Change detection only under packages/governance/data/, but git add packages/governance/.

To avoid missing manifest-only updates and to keep scheduled runs robust, consider applying the same adjustments suggested for sync-property (widening the diff path and guarding skip_preflight by github.event_name).

Also applies to: 65-85

.github/workflows/sync-operations.yml (1)

39-45: Keep sync-operations consistent with other bundle workflows

Here as well:

Preflight is gated by if: ${{ !inputs.skip_preflight }}.

Change detection is scoped to packages/operations/data/ while committing all of packages/operations/.

To ensure manifest and non-data/ changes are committed (if applicable) and to make scheduled runs robust, consider applying the same changes recommended for sync-property to this workflow.

Also applies to: 65-85

🧹 Nitpick comments (2)

.gitignore (1)
1-50: Consider adding common Python testing and type-checking cache directories (optional).

For projects that use pytest, mypy, tox, or similar tools locally, you may want to add patterns to prevent committing transient cache artifacts:
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
+.pytest_cache/
+.mypy_cache/
+.dmypy.json
+dmypy.json
+.tox/
 build/
This is optional since CI systems typically don't commit these directories, but it provides cleaner local development and repository history.
scripts/preflight.py (1)
67-100: Tighten error handling and clean up minor Ruff warnings

A few smaller points that improve clarity and avoid masking bugs:
Several print calls in check_bundle_config (e.g., Lines 67, 74, 82, 99) use f-strings without any {} placeholders. Dropping the f prefix will satisfy Ruff F541 and make the intent clearer; no behavior change.
check_output_directories (Lines 104-125) currently catches a bare Exception. Since the operations here are purely filesystem-related, you can narrow this to OSError (or OSError | PermissionError) so that unexpected programming errors still surface:
except Exception as e:
except OSError as e:
print(f"❌ Cannot write to '{output_root}': {e}", file=sys.stderr)
return False
This keeps preflight focused on genuine IO/permission issues instead of swallowing unrelated exceptions.





Also applies to: 104-125

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 6ab700a0597f06936cc58ea660369c958be9803c and c87efd11fad0331a60468c2ff1649bd72dc7522b.

</details>

<details>
<summary>📒 Files selected for processing (7)</summary>

* `.github/workflows/sync-evidence.yml` (1 hunks)
* `.github/workflows/sync-governance.yml` (1 hunks)
* `.github/workflows/sync-operations.yml` (1 hunks)
* `.github/workflows/sync-property.yml` (1 hunks)
* `.gitignore` (1 hunks)
* `scripts/preflight.py` (1 hunks)
* `scripts/verify_bundle.py` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>scripts/preflight.py (1)</summary><blockquote>

<details>
<summary>scripts/verify_bundle.py (3)</summary>

* `parse_args` (15-19)
* `load_yaml_config` (22-32)
* `main` (117-170)

</details>

</blockquote></details>

</details><details>
<summary>🪛 Ruff (0.14.7)</summary>

<details>
<summary>scripts/preflight.py</summary>

47-47: PEP 484 prohibits implicit `Optional`

Convert to `T | None`

(RUF013)

---

68-68: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

---

74-74: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

---

82-82: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

---

110-110: String contains ambiguous `ℹ` (INFORMATION SOURCE). Did you mean `i` (LATIN SMALL LETTER I)?

(RUF001)

---

122-122: Consider moving this statement to an `else` block

(TRY300)

---

123-123: Do not catch blind exception: `Exception`

(BLE001)

</details>
<details>
<summary>scripts/verify_bundle.py</summary>

45-45: Consider moving this statement to an `else` block

(TRY300)

---

81-81: Do not catch blind exception: `Exception`

(BLE001)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (3)</summary><blockquote>

<details>
<summary>.gitignore (1)</summary><blockquote>

`1-50`: **Well-structured .gitignore aligned with PR objectives.**

The file provides comprehensive coverage for Python projects, IDEs, OS-specific files, and—importantly for this PR—project-specific sensitive data (Notion exports, credentials, logs) that workflows and validation scripts may generate. The organization by logical sections is clear and maintainable.

</blockquote></details>
<details>
<summary>scripts/preflight.py (1)</summary><blockquote>

`21-32`: **Handle empty/invalid YAML defensively and tighten `bundle_key` typing**

Two robustness issues to address:

1. **`load_yaml_config` (lines 21–32)**  
 `yaml.safe_load()` returns `None` for empty YAML files. Downstream usage like `cfg.get("export_bundles", {})` would raise `AttributeError` if `cfg` is `None`. Normalize the result and fail fast:

 ```diff
 def load_yaml_config(path: str) -> dict:
     """Load and parse YAML configuration file."""
     try:
         with open(path, "r") as f:
-              return yaml.safe_load(f)
+              data = yaml.safe_load(f)
+              if data is None:
+                  print(f"❌ Config file '{path}' is empty", file=sys.stderr)
+                  sys.exit(1)
+              return data
check_bundle_config signature (line 47)
bundle_key: str = None uses the implicit Optional pattern that Ruff flags. Use the modern syntax instead:
def check_bundle_config(cfg: dict, bundle_key: str = None) -> bool:
def check_bundle_config(cfg: dict, bundle_key: str | None = None) -> bool:
.github/workflows/sync-property.yml (1)
39-45: Ensure scheduled runs handle skip_preflight safely and verify diff scope consistency

Two issues to address:
Inputs context in scheduled runs (Lines 39-40)
The inputs context is undefined for schedule events—only workflow_dispatch events populate it. While GitHub coerces missing inputs to falsy values, relying on this behavior is fragile. Explicitly guard the condition:
- name: Preflight checks
  if: ${{ github.event_name != 'workflow_dispatch' || !inputs.skip_preflight }}
Diff scope mismatch (Lines 65-76)
The workflow detects changes only under packages/property/data/:
git diff --exit-code packages/property/data/ || echo "has_changes=true" >> $GITHUB_OUTPUT
but later commits the entire packages/property/ tree. If scripts/export_bundles.py writes to packages/property/manifest.json or other files outside data/, those changes will be skipped because the diff won't detect them. Either widen the diff scope to packages/property/ or document why only data/ changes should trigger commits. Ensure consistency across other bundle workflows.

coderabbitai · 2025-12-04T21:30:29Z

scripts/verify_bundle.py

+def verify_csv_files(manifest: dict, pkg_dir: Path) -> bool:
+    """Verify all CSV files referenced in manifest exist and are readable."""
+    items = manifest.get("items", [])
+    if not items:
+        print("⚠️  No items in manifest", file=sys.stderr)
+        return False
+
+    all_valid = True
+    for item in items:
+        db_key = item.get("db_key", "unknown")
+        output_path = item.get("output", "")
+        row_count = item.get("row_count", 0)
+
+        csv_path = pkg_dir / output_path
+        if not csv_path.exists():
+            print(f"❌ CSV file not found: {csv_path}", file=sys.stderr)
+            all_valid = False
+            continue
+
+        # Verify CSV is readable and count rows
+        try:
+            with open(csv_path, "r") as f:
+                reader = csv.reader(f)
+                rows = list(reader)
+                actual_count = len(rows) - 1  # Exclude header
+
+                if actual_count != row_count:
+                    print(f"⚠️  Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", file=sys.stderr)
+
+                print(f"✅ CSV verified: {db_key} ({actual_count} rows)")
+        except Exception as e:
+            print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr)
+            all_valid = False
+
+    return all_valid


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/verify_bundle.py

Repository: chittyos/.github

Length of output: 176

🏁 Script executed:

git ls-files scripts/verify_bundle.py

Repository: chittyos/.github

Length of output: 176

🌐 Web query:

Python csv module exceptions csv.Error OSError

💡 Result:

Short summary

csv.Error — module-specific exception raised for CSV parsing/dialect/format problems (reader/writer/Sniffer/etc.). Catch it to detect ill-formed CSV input or csv-api misuse. [1]

OSError (and its common subclasses FileNotFoundError, PermissionError, etc.) — raised for OS / file I/O errors (opening/reading/writing files). These are not csv module exceptions but will occur when you open/read the underlying file. [2]

Example pattern (concise):

Catch OSError around file open/IO.

Catch csv.Error around parsing/reading/writing.

Or catch both if you just need to handle any error.

Example:

import csv
try:
with open('data.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
...
except OSError as e:
# file I/O (FileNotFoundError, PermissionError, ...)
handle_io(e)
except csv.Error as e:
# CSV parsing / dialect / format errors
handle_csv(e)

References

csv module (csv.Error). [1]

Built-in exceptions / OSError and subclasses (FileNotFoundError, PermissionError, etc.). [2]

Make CSV row-count mismatches fail verification and narrow exception handling

verify_csv_files currently treats mismatched row counts as a warning only:

On mismatch (lines 77–79), the code logs ⚠️ but does not set all_valid = False.

As a result, verify_csv_files returns True even when the manifest's row_count disagrees with actual CSV contents, and main() reports "All verification checks passed".

Since this script is a CI gate for bundle integrity, row-count mismatches should fail verification, not pass silently. Additionally, the broad Exception catch masks file I/O and CSV-specific errors. Narrow it to handle only relevant exceptions:

- try: - with open(csv_path, "r") as f: - reader = csv.reader(f) - rows = list(reader) - actual_count = len(rows) - 1 # Exclude header - - if actual_count != row_count: - print(f"⚠️ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", file=sys.stderr) - - print(f"✅ CSV verified: {db_key} ({actual_count} rows)") - except Exception as e: - print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr) - all_valid = False + try: + actual_count = -1 + with open(csv_path, "r", newline="") as f: + reader = csv.reader(f) + # Count rows without loading entire file into memory + for i, _ in enumerate(reader): + actual_count = i + + if actual_count != row_count: + print( + f"❌ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", + file=sys.stderr, + ) + all_valid = False + else: + print(f"✅ CSV verified: {db_key} ({actual_count} rows)") + except (OSError, csv.Error) as e: + print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr) + all_valid = False

This change:

Makes row-count mismatches fail verification (sets all_valid = False).

Avoids loading entire CSVs into memory via streaming row count.

Narrows exception handling to OSError (file I/O) and csv.Error (CSV parsing) instead of catching all exceptions.

🧰 Tools

🪛 Ruff (0.14.7)

81-81: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In scripts/verify_bundle.py around lines 51 to 85, the function verify_csv_files treats CSV row-count mismatches as warnings and catches all exceptions; update it so mismatched row counts set all_valid = False, and count rows by streaming the CSV (iterate the reader and increment a counter, subtract 1 for the header) instead of loading the whole file into memory; narrow the exception handling to only OSError and csv.Error and keep printing the same error message while still setting all_valid = False on read/parse failures.

Add sync workflows and validation scripts

c87efd1

chitcommit added the chore Chore and maintenance tasks label Dec 4, 2025

chitcommit merged commit 631f01d into main Dec 4, 2025
0 of 2 checks passed

chitcommit deleted the chore/sync-workflows-and-validators branch December 4, 2025 21:26

chatgpt-codex-connector bot reviewed Dec 4, 2025

View reviewed changes

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sync workflows and validation scripts #1

Add sync workflows and validation scripts #1

Uh oh!

chitcommit commented Dec 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add sync workflows and validation scripts #1

Add sync workflows and validation scripts #1

Uh oh!

Conversation

chitcommit commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chitcommit commented Dec 4, 2025 •

edited

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading