Skip to content

Conversation

@chitcommit
Copy link
Contributor

@chitcommit chitcommit commented Dec 4, 2025

Adds hardened sync workflows for governance, evidence, operations, and property bundles. Includes Python preflight and post-export verification scripts.\n\nWhat’s included:\n- Four reusable workflows: sync-*-bundle\n- Preflight validator: scripts/preflight.py\n- Bundle verification: scripts/verify_bundle.py\n- Executable bits and .gitignore\n\nTesting done:\n- Ran preflight to confirm token/placeholder checks trigger as expected\n\nFollow-ups:\n- Set NOTION_TOKEN secret in repo or org\n- Replace placeholder Notion database URLs\n- Wire scheduled triggers if desired

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

Walkthrough

Introduces automated daily bundle synchronization infrastructure with four GitHub Actions workflows—Evidence, Governance, Operations, and Property—each triggering on schedule and manual dispatch. Includes supporting Python validation scripts for preflight checks and post-export bundle verification, plus project-wide gitignore patterns.

Changes

Cohort / File(s) Summary
Bundle Sync Workflows
.github/workflows/sync-evidence.yml, .github/workflows/sync-governance.yml, .github/workflows/sync-operations.yml, .github/workflows/sync-property.yml
Four GitHub Actions workflows automating daily bundle export and sync at staggered UTC times (2:00, 3:00, 4:00, 5:00). Each workflow: checks out code, sets up Python 3.11, runs preflight validation, performs dry-run and full export, verifies bundle, detects changes, commits/pushes updates, and publishes step summary with manifest content. Uses CHITTY_ID for tracking and includes structured commit metadata.
Validation Scripts
scripts/preflight.py, scripts/verify_bundle.py
New Python utilities for bundle management. preflight.py validates YAML config, Notion token, bundle configuration (names, schema versions, database URLs), and output directory writability. verify_bundle.py post-export verification: checks manifest existence/validity, CSV file integrity, schema version compatibility, and ChittyID compliance in audit sections.
Project Configuration
.gitignore
Standard ignore patterns for Python artifacts, virtual environments, IDE folders, OS files, Notion exports, secrets, and logs.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions
    participant Script as Python Scripts
    participant Git as Git Repository
    participant Notion as Notion API
    participant Verify as Verification

    GHA->>Script: Run preflight checks
    Script->>Script: Validate config & token
    Script-->>GHA: ✓ Preflight result
    
    GHA->>Script: Dry-run export
    Script->>Notion: Query databases
    Notion-->>Script: Return data
    Script-->>GHA: ✓ Dry-run validated
    
    GHA->>Script: Full export
    Script->>Notion: Query databases
    Notion-->>Script: Return data
    Script->>Git: Write bundles & manifest
    Script-->>GHA: ✓ Export complete
    
    GHA->>Verify: Verify bundle
    Verify->>Git: Read manifest & CSVs
    Verify->>Verify: Validate integrity
    Verify-->>GHA: ✓ Verification passed
    
    alt Changes detected
        GHA->>Git: Commit with metadata
        Git-->>GHA: ✓ Committed
        GHA->>Git: Push to repository
        Git-->>GHA: ✓ Pushed
    end
    
    GHA->>GHA: Publish summary artifact
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Workflow consistency: Verify all four workflows follow the same pattern and handle edge cases uniformly (preflight skipping, change detection, commit logic)
  • Error handling in Python scripts: Confirm proper exit codes, error messages to stderr, and graceful handling of missing config/YAML/files in preflight.py and verify_bundle.py
  • Git change detection logic: Validate that git diff --name-only correctly identifies changes in bundle directories
  • Manifest parsing: Ensure robust JSON parsing and optional field handling across workflows

Poem

🐰 Four workflows hop each day, syncing bundles on their way,
Preflight checks and verify—no surprises, we certify!
From Notion's gardens to Git's deep store, hop hop automation's core! 🌿✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main changes: adding four new GitHub Actions workflows for syncing different registries and two new validation scripts (preflight.py and verify_bundle.py).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/sync-workflows-and-validators

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chitcommit chitcommit added the chore Chore and maintenance tasks label Dec 4, 2025
@chitcommit chitcommit merged commit 631f01d into main Dec 4, 2025
0 of 2 checks passed
@chitcommit chitcommit deleted the chore/sync-workflows-and-validators branch December 4, 2025 21:26
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +50 to +52
run: |
echo "Running dry-run to validate export..."
python scripts/export_bundles.py evidence --dry-run

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add missing export_bundles.py script

The sync workflow invokes python scripts/export_bundles.py evidence --dry-run/python scripts/export_bundles.py evidence, but the repository does not contain scripts/export_bundles.py (checked with find .. -name "export_bundles.py" returning no results). Every run of this job (and the governance/operations/property variants that make the same call) will fail before exporting data because the script cannot be found.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
.github/workflows/sync-evidence.yml (1)

39-45: Align diff scope and preflight behavior with sync-property recommendations

This workflow mirrors sync-property with:

  • if: ${{ !inputs.skip_preflight }} for the preflight step.
  • Change detection limited to packages/evidence/data/ while committing packages/evidence/ as a whole.

The same considerations apply:

  • Consider guarding skip_preflight by github.event_name for clarity.
  • Consider diffing packages/evidence/ so manifest-only updates (e.g., packages/evidence/manifest.json) also trigger a commit, if your export script touches them.

See the comment on .github/workflows/sync-property.yml for a concrete diff that can be adapted here.

Also applies to: 65-85

.github/workflows/sync-governance.yml (1)

39-45: Reuse the same diff and preflight guards as other sync workflows

sync-governance has the same structure as the other bundle sync jobs:

  • Optional preflight gated by if: ${{ !inputs.skip_preflight }}.
  • Change detection only under packages/governance/data/, but git add packages/governance/.

To avoid missing manifest-only updates and to keep scheduled runs robust, consider applying the same adjustments suggested for sync-property (widening the diff path and guarding skip_preflight by github.event_name).

Also applies to: 65-85

.github/workflows/sync-operations.yml (1)

39-45: Keep sync-operations consistent with other bundle workflows

Here as well:

  • Preflight is gated by if: ${{ !inputs.skip_preflight }}.
  • Change detection is scoped to packages/operations/data/ while committing all of packages/operations/.

To ensure manifest and non-data/ changes are committed (if applicable) and to make scheduled runs robust, consider applying the same changes recommended for sync-property to this workflow.

Also applies to: 65-85

🧹 Nitpick comments (2)
.gitignore (1)

1-50: Consider adding common Python testing and type-checking cache directories (optional).

For projects that use pytest, mypy, tox, or similar tools locally, you may want to add patterns to prevent committing transient cache artifacts:

 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
+.pytest_cache/
+.mypy_cache/
+.dmypy.json
+dmypy.json
+.tox/
 build/

This is optional since CI systems typically don't commit these directories, but it provides cleaner local development and repository history.

scripts/preflight.py (1)

67-100: Tighten error handling and clean up minor Ruff warnings

A few smaller points that improve clarity and avoid masking bugs:

  • Several print calls in check_bundle_config (e.g., Lines 67, 74, 82, 99) use f-strings without any {} placeholders. Dropping the f prefix will satisfy Ruff F541 and make the intent clearer; no behavior change.

  • check_output_directories (Lines 104-125) currently catches a bare Exception. Since the operations here are purely filesystem-related, you can narrow this to OSError (or OSError | PermissionError) so that unexpected programming errors still surface:

  • except Exception as e:

  • except OSError as e:
    print(f"❌ Cannot write to '{output_root}': {e}", file=sys.stderr)
    return False

This keeps preflight focused on genuine IO/permission issues instead of swallowing unrelated exceptions.





Also applies to: 104-125

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 6ab700a0597f06936cc58ea660369c958be9803c and c87efd11fad0331a60468c2ff1649bd72dc7522b.

</details>

<details>
<summary>📒 Files selected for processing (7)</summary>

* `.github/workflows/sync-evidence.yml` (1 hunks)
* `.github/workflows/sync-governance.yml` (1 hunks)
* `.github/workflows/sync-operations.yml` (1 hunks)
* `.github/workflows/sync-property.yml` (1 hunks)
* `.gitignore` (1 hunks)
* `scripts/preflight.py` (1 hunks)
* `scripts/verify_bundle.py` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (1)</summary>

<details>
<summary>scripts/preflight.py (1)</summary><blockquote>

<details>
<summary>scripts/verify_bundle.py (3)</summary>

* `parse_args` (15-19)
* `load_yaml_config` (22-32)
* `main` (117-170)

</details>

</blockquote></details>

</details><details>
<summary>🪛 Ruff (0.14.7)</summary>

<details>
<summary>scripts/preflight.py</summary>

47-47: PEP 484 prohibits implicit `Optional`

Convert to `T | None`

(RUF013)

---

68-68: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

---

74-74: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

---

82-82: f-string without any placeholders

Remove extraneous `f` prefix

(F541)

---

110-110: String contains ambiguous `ℹ` (INFORMATION SOURCE). Did you mean `i` (LATIN SMALL LETTER I)?

(RUF001)

---

122-122: Consider moving this statement to an `else` block

(TRY300)

---

123-123: Do not catch blind exception: `Exception`

(BLE001)

</details>
<details>
<summary>scripts/verify_bundle.py</summary>

45-45: Consider moving this statement to an `else` block

(TRY300)

---

81-81: Do not catch blind exception: `Exception`

(BLE001)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (3)</summary><blockquote>

<details>
<summary>.gitignore (1)</summary><blockquote>

`1-50`: **Well-structured .gitignore aligned with PR objectives.**

The file provides comprehensive coverage for Python projects, IDEs, OS-specific files, and—importantly for this PR—project-specific sensitive data (Notion exports, credentials, logs) that workflows and validation scripts may generate. The organization by logical sections is clear and maintainable.

</blockquote></details>
<details>
<summary>scripts/preflight.py (1)</summary><blockquote>

`21-32`: **Handle empty/invalid YAML defensively and tighten `bundle_key` typing**

Two robustness issues to address:

1. **`load_yaml_config` (lines 21–32)**  
 `yaml.safe_load()` returns `None` for empty YAML files. Downstream usage like `cfg.get("export_bundles", {})` would raise `AttributeError` if `cfg` is `None`. Normalize the result and fail fast:

 ```diff
 def load_yaml_config(path: str) -> dict:
     """Load and parse YAML configuration file."""
     try:
         with open(path, "r") as f:
-              return yaml.safe_load(f)
+              data = yaml.safe_load(f)
+              if data is None:
+                  print(f"❌ Config file '{path}' is empty", file=sys.stderr)
+                  sys.exit(1)
+              return data
  1. check_bundle_config signature (line 47)
    bundle_key: str = None uses the implicit Optional pattern that Ruff flags. Use the modern syntax instead:

  • def check_bundle_config(cfg: dict, bundle_key: str = None) -> bool:
  • def check_bundle_config(cfg: dict, bundle_key: str | None = None) -> bool:
    
    
.github/workflows/sync-property.yml (1)

39-45: Ensure scheduled runs handle skip_preflight safely and verify diff scope consistency

Two issues to address:

  1. Inputs context in scheduled runs (Lines 39-40)
    The inputs context is undefined for schedule events—only workflow_dispatch events populate it. While GitHub coerces missing inputs to falsy values, relying on this behavior is fragile. Explicitly guard the condition:

    - name: Preflight checks
      if: ${{ github.event_name != 'workflow_dispatch' || !inputs.skip_preflight }}
  2. Diff scope mismatch (Lines 65-76)
    The workflow detects changes only under packages/property/data/:

    git diff --exit-code packages/property/data/ || echo "has_changes=true" >> $GITHUB_OUTPUT

    but later commits the entire packages/property/ tree. If scripts/export_bundles.py writes to packages/property/manifest.json or other files outside data/, those changes will be skipped because the diff won't detect them. Either widen the diff scope to packages/property/ or document why only data/ changes should trigger commits. Ensure consistency across other bundle workflows.

Comment on lines +51 to +85
def verify_csv_files(manifest: dict, pkg_dir: Path) -> bool:
"""Verify all CSV files referenced in manifest exist and are readable."""
items = manifest.get("items", [])
if not items:
print("⚠️ No items in manifest", file=sys.stderr)
return False

all_valid = True
for item in items:
db_key = item.get("db_key", "unknown")
output_path = item.get("output", "")
row_count = item.get("row_count", 0)

csv_path = pkg_dir / output_path
if not csv_path.exists():
print(f"❌ CSV file not found: {csv_path}", file=sys.stderr)
all_valid = False
continue

# Verify CSV is readable and count rows
try:
with open(csv_path, "r") as f:
reader = csv.reader(f)
rows = list(reader)
actual_count = len(rows) - 1 # Exclude header

if actual_count != row_count:
print(f"⚠️ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", file=sys.stderr)

print(f"✅ CSV verified: {db_key} ({actual_count} rows)")
except Exception as e:
print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr)
all_valid = False

return all_valid
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/verify_bundle.py

Repository: chittyos/.github

Length of output: 176


🏁 Script executed:

git ls-files scripts/verify_bundle.py

Repository: chittyos/.github

Length of output: 176


🌐 Web query:

Python csv module exceptions csv.Error OSError

💡 Result:

Short summary

  • csv.Error — module-specific exception raised for CSV parsing/dialect/format problems (reader/writer/Sniffer/etc.). Catch it to detect ill-formed CSV input or csv-api misuse. [1]

  • OSError (and its common subclasses FileNotFoundError, PermissionError, etc.) — raised for OS / file I/O errors (opening/reading/writing files). These are not csv module exceptions but will occur when you open/read the underlying file. [2]

Example pattern (concise):

  • Catch OSError around file open/IO.
  • Catch csv.Error around parsing/reading/writing.
  • Or catch both if you just need to handle any error.

Example:

import csv
try:
with open('data.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
...
except OSError as e:
# file I/O (FileNotFoundError, PermissionError, ...)
handle_io(e)
except csv.Error as e:
# CSV parsing / dialect / format errors
handle_csv(e)

References

  • csv module (csv.Error). [1]
  • Built-in exceptions / OSError and subclasses (FileNotFoundError, PermissionError, etc.). [2]

Make CSV row-count mismatches fail verification and narrow exception handling

verify_csv_files currently treats mismatched row counts as a warning only:

  • On mismatch (lines 77–79), the code logs ⚠️ but does not set all_valid = False.
  • As a result, verify_csv_files returns True even when the manifest's row_count disagrees with actual CSV contents, and main() reports "All verification checks passed".

Since this script is a CI gate for bundle integrity, row-count mismatches should fail verification, not pass silently. Additionally, the broad Exception catch masks file I/O and CSV-specific errors. Narrow it to handle only relevant exceptions:

-        try:
-            with open(csv_path, "r") as f:
-                reader = csv.reader(f)
-                rows = list(reader)
-                actual_count = len(rows) - 1  # Exclude header
-
-                if actual_count != row_count:
-                    print(f"⚠️  Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", file=sys.stderr)
-
-                print(f"✅ CSV verified: {db_key} ({actual_count} rows)")
-        except Exception as e:
-            print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr)
-            all_valid = False
+        try:
+            actual_count = -1
+            with open(csv_path, "r", newline="") as f:
+                reader = csv.reader(f)
+                # Count rows without loading entire file into memory
+                for i, _ in enumerate(reader):
+                    actual_count = i
+
+            if actual_count != row_count:
+                print(
+                    f"❌ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}",
+                    file=sys.stderr,
+                )
+                all_valid = False
+            else:
+                print(f"✅ CSV verified: {db_key} ({actual_count} rows)")
+        except (OSError, csv.Error) as e:
+            print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr)
+            all_valid = False

This change:

  • Makes row-count mismatches fail verification (sets all_valid = False).
  • Avoids loading entire CSVs into memory via streaming row count.
  • Narrows exception handling to OSError (file I/O) and csv.Error (CSV parsing) instead of catching all exceptions.
🧰 Tools
🪛 Ruff (0.14.7)

81-81: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In scripts/verify_bundle.py around lines 51 to 85, the function verify_csv_files
treats CSV row-count mismatches as warnings and catches all exceptions; update
it so mismatched row counts set all_valid = False, and count rows by streaming
the CSV (iterate the reader and increment a counter, subtract 1 for the header)
instead of loading the whole file into memory; narrow the exception handling to
only OSError and csv.Error and keep printing the same error message while still
setting all_valid = False on read/parse failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Chore and maintenance tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants