-
Notifications
You must be signed in to change notification settings - Fork 0
Add sync workflows and validation scripts #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces automated daily bundle synchronization infrastructure with four GitHub Actions workflows—Evidence, Governance, Operations, and Property—each triggering on schedule and manual dispatch. Includes supporting Python validation scripts for preflight checks and post-export bundle verification, plus project-wide gitignore patterns. Changes
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant Script as Python Scripts
participant Git as Git Repository
participant Notion as Notion API
participant Verify as Verification
GHA->>Script: Run preflight checks
Script->>Script: Validate config & token
Script-->>GHA: ✓ Preflight result
GHA->>Script: Dry-run export
Script->>Notion: Query databases
Notion-->>Script: Return data
Script-->>GHA: ✓ Dry-run validated
GHA->>Script: Full export
Script->>Notion: Query databases
Notion-->>Script: Return data
Script->>Git: Write bundles & manifest
Script-->>GHA: ✓ Export complete
GHA->>Verify: Verify bundle
Verify->>Git: Read manifest & CSVs
Verify->>Verify: Validate integrity
Verify-->>GHA: ✓ Verification passed
alt Changes detected
GHA->>Git: Commit with metadata
Git-->>GHA: ✓ Committed
GHA->>Git: Push to repository
Git-->>GHA: ✓ Pushed
end
GHA->>GHA: Publish summary artifact
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| run: | | ||
| echo "Running dry-run to validate export..." | ||
| python scripts/export_bundles.py evidence --dry-run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing export_bundles.py script
The sync workflow invokes python scripts/export_bundles.py evidence --dry-run/python scripts/export_bundles.py evidence, but the repository does not contain scripts/export_bundles.py (checked with find .. -name "export_bundles.py" returning no results). Every run of this job (and the governance/operations/property variants that make the same call) will fail before exporting data because the script cannot be found.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
.github/workflows/sync-evidence.yml (1)
39-45: Align diff scope and preflight behavior withsync-propertyrecommendationsThis workflow mirrors
sync-propertywith:
if: ${{ !inputs.skip_preflight }}for the preflight step.- Change detection limited to
packages/evidence/data/while committingpackages/evidence/as a whole.The same considerations apply:
- Consider guarding
skip_preflightbygithub.event_namefor clarity.- Consider diffing
packages/evidence/so manifest-only updates (e.g.,packages/evidence/manifest.json) also trigger a commit, if your export script touches them.See the comment on
.github/workflows/sync-property.ymlfor a concrete diff that can be adapted here.Also applies to: 65-85
.github/workflows/sync-governance.yml (1)
39-45: Reuse the same diff and preflight guards as other sync workflows
sync-governancehas the same structure as the other bundle sync jobs:
- Optional preflight gated by
if: ${{ !inputs.skip_preflight }}.- Change detection only under
packages/governance/data/, butgit add packages/governance/.To avoid missing manifest-only updates and to keep scheduled runs robust, consider applying the same adjustments suggested for
sync-property(widening the diff path and guardingskip_preflightbygithub.event_name).Also applies to: 65-85
.github/workflows/sync-operations.yml (1)
39-45: Keepsync-operationsconsistent with other bundle workflowsHere as well:
- Preflight is gated by
if: ${{ !inputs.skip_preflight }}.- Change detection is scoped to
packages/operations/data/while committing all ofpackages/operations/.To ensure manifest and non-
data/changes are committed (if applicable) and to make scheduled runs robust, consider applying the same changes recommended forsync-propertyto this workflow.Also applies to: 65-85
🧹 Nitpick comments (2)
.gitignore (1)
1-50: Consider adding common Python testing and type-checking cache directories (optional).For projects that use pytest, mypy, tox, or similar tools locally, you may want to add patterns to prevent committing transient cache artifacts:
# Python __pycache__/ *.py[cod] *$py.class *.so .Python +.pytest_cache/ +.mypy_cache/ +.dmypy.json +dmypy.json +.tox/ build/This is optional since CI systems typically don't commit these directories, but it provides cleaner local development and repository history.
scripts/preflight.py (1)
67-100: Tighten error handling and clean up minor Ruff warningsA few smaller points that improve clarity and avoid masking bugs:
Several
check_bundle_config(e.g., Lines 67, 74, 82, 99) use f-strings without any{}placeholders. Dropping thefprefix will satisfy Ruff F541 and make the intent clearer; no behavior change.
check_output_directories(Lines 104-125) currently catches a bareException. Since the operations here are purely filesystem-related, you can narrow this toOSError(orOSError | PermissionError) so that unexpected programming errors still surface:except Exception as e:
- except OSError as e:
print(f"❌ Cannot write to '{output_root}': {e}", file=sys.stderr)
return FalseThis keeps preflight focused on genuine IO/permission issues instead of swallowing unrelated exceptions. Also applies to: 104-125 </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 6ab700a0597f06936cc58ea660369c958be9803c and c87efd11fad0331a60468c2ff1649bd72dc7522b. </details> <details> <summary>📒 Files selected for processing (7)</summary> * `.github/workflows/sync-evidence.yml` (1 hunks) * `.github/workflows/sync-governance.yml` (1 hunks) * `.github/workflows/sync-operations.yml` (1 hunks) * `.github/workflows/sync-property.yml` (1 hunks) * `.gitignore` (1 hunks) * `scripts/preflight.py` (1 hunks) * `scripts/verify_bundle.py` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧬 Code graph analysis (1)</summary> <details> <summary>scripts/preflight.py (1)</summary><blockquote> <details> <summary>scripts/verify_bundle.py (3)</summary> * `parse_args` (15-19) * `load_yaml_config` (22-32) * `main` (117-170) </details> </blockquote></details> </details><details> <summary>🪛 Ruff (0.14.7)</summary> <details> <summary>scripts/preflight.py</summary> 47-47: PEP 484 prohibits implicit `Optional` Convert to `T | None` (RUF013) --- 68-68: f-string without any placeholders Remove extraneous `f` prefix (F541) --- 74-74: f-string without any placeholders Remove extraneous `f` prefix (F541) --- 82-82: f-string without any placeholders Remove extraneous `f` prefix (F541) --- 110-110: String contains ambiguous `ℹ` (INFORMATION SOURCE). Did you mean `i` (LATIN SMALL LETTER I)? (RUF001) --- 122-122: Consider moving this statement to an `else` block (TRY300) --- 123-123: Do not catch blind exception: `Exception` (BLE001) </details> <details> <summary>scripts/verify_bundle.py</summary> 45-45: Consider moving this statement to an `else` block (TRY300) --- 81-81: Do not catch blind exception: `Exception` (BLE001) </details> </details> </details> <details> <summary>🔇 Additional comments (3)</summary><blockquote> <details> <summary>.gitignore (1)</summary><blockquote> `1-50`: **Well-structured .gitignore aligned with PR objectives.** The file provides comprehensive coverage for Python projects, IDEs, OS-specific files, and—importantly for this PR—project-specific sensitive data (Notion exports, credentials, logs) that workflows and validation scripts may generate. The organization by logical sections is clear and maintainable. </blockquote></details> <details> <summary>scripts/preflight.py (1)</summary><blockquote> `21-32`: **Handle empty/invalid YAML defensively and tighten `bundle_key` typing** Two robustness issues to address: 1. **`load_yaml_config` (lines 21–32)** `yaml.safe_load()` returns `None` for empty YAML files. Downstream usage like `cfg.get("export_bundles", {})` would raise `AttributeError` if `cfg` is `None`. Normalize the result and fail fast: ```diff def load_yaml_config(path: str) -> dict: """Load and parse YAML configuration file.""" try: with open(path, "r") as f: - return yaml.safe_load(f) + data = yaml.safe_load(f) + if data is None: + print(f"❌ Config file '{path}' is empty", file=sys.stderr) + sys.exit(1) + return data
check_bundle_configsignature (line 47)
bundle_key: str = Noneuses the implicit Optional pattern that Ruff flags. Use the modern syntax instead:
- def check_bundle_config(cfg: dict, bundle_key: str = None) -> bool:
- def check_bundle_config(cfg: dict, bundle_key: str | None = None) -> bool:
.github/workflows/sync-property.yml (1)
39-45: Ensure scheduled runs handleskip_preflightsafely and verify diff scope consistencyTwo issues to address:
Inputs context in scheduled runs (Lines 39-40)
Theinputscontext is undefined forscheduleevents—onlyworkflow_dispatchevents populate it. While GitHub coerces missing inputs to falsy values, relying on this behavior is fragile. Explicitly guard the condition:- name: Preflight checks if: ${{ github.event_name != 'workflow_dispatch' || !inputs.skip_preflight }}Diff scope mismatch (Lines 65-76)
The workflow detects changes only underpackages/property/data/:git diff --exit-code packages/property/data/ || echo "has_changes=true" >> $GITHUB_OUTPUTbut later commits the entire
packages/property/tree. Ifscripts/export_bundles.pywrites topackages/property/manifest.jsonor other files outsidedata/, those changes will be skipped because the diff won't detect them. Either widen the diff scope topackages/property/or document why onlydata/changes should trigger commits. Ensure consistency across other bundle workflows.
| def verify_csv_files(manifest: dict, pkg_dir: Path) -> bool: | ||
| """Verify all CSV files referenced in manifest exist and are readable.""" | ||
| items = manifest.get("items", []) | ||
| if not items: | ||
| print("⚠️ No items in manifest", file=sys.stderr) | ||
| return False | ||
|
|
||
| all_valid = True | ||
| for item in items: | ||
| db_key = item.get("db_key", "unknown") | ||
| output_path = item.get("output", "") | ||
| row_count = item.get("row_count", 0) | ||
|
|
||
| csv_path = pkg_dir / output_path | ||
| if not csv_path.exists(): | ||
| print(f"❌ CSV file not found: {csv_path}", file=sys.stderr) | ||
| all_valid = False | ||
| continue | ||
|
|
||
| # Verify CSV is readable and count rows | ||
| try: | ||
| with open(csv_path, "r") as f: | ||
| reader = csv.reader(f) | ||
| rows = list(reader) | ||
| actual_count = len(rows) - 1 # Exclude header | ||
|
|
||
| if actual_count != row_count: | ||
| print(f"⚠️ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", file=sys.stderr) | ||
|
|
||
| print(f"✅ CSV verified: {db_key} ({actual_count} rows)") | ||
| except Exception as e: | ||
| print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr) | ||
| all_valid = False | ||
|
|
||
| return all_valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n scripts/verify_bundle.pyRepository: chittyos/.github
Length of output: 176
🏁 Script executed:
git ls-files scripts/verify_bundle.pyRepository: chittyos/.github
Length of output: 176
🌐 Web query:
Python csv module exceptions csv.Error OSError
💡 Result:
Short summary
-
csv.Error — module-specific exception raised for CSV parsing/dialect/format problems (reader/writer/Sniffer/etc.). Catch it to detect ill-formed CSV input or csv-api misuse. [1]
-
OSError (and its common subclasses FileNotFoundError, PermissionError, etc.) — raised for OS / file I/O errors (opening/reading/writing files). These are not csv module exceptions but will occur when you open/read the underlying file. [2]
Example pattern (concise):
- Catch OSError around file open/IO.
- Catch csv.Error around parsing/reading/writing.
- Or catch both if you just need to handle any error.
Example:
import csv
try:
with open('data.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
...
except OSError as e:
# file I/O (FileNotFoundError, PermissionError, ...)
handle_io(e)
except csv.Error as e:
# CSV parsing / dialect / format errors
handle_csv(e)
References
- csv module (csv.Error). [1]
- Built-in exceptions / OSError and subclasses (FileNotFoundError, PermissionError, etc.). [2]
Make CSV row-count mismatches fail verification and narrow exception handling
verify_csv_files currently treats mismatched row counts as a warning only:
- On mismatch (lines 77–79), the code logs
⚠️but does not setall_valid = False. - As a result,
verify_csv_filesreturnsTrueeven when the manifest'srow_countdisagrees with actual CSV contents, andmain()reports "All verification checks passed".
Since this script is a CI gate for bundle integrity, row-count mismatches should fail verification, not pass silently. Additionally, the broad Exception catch masks file I/O and CSV-specific errors. Narrow it to handle only relevant exceptions:
- try:
- with open(csv_path, "r") as f:
- reader = csv.reader(f)
- rows = list(reader)
- actual_count = len(rows) - 1 # Exclude header
-
- if actual_count != row_count:
- print(f"⚠️ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}", file=sys.stderr)
-
- print(f"✅ CSV verified: {db_key} ({actual_count} rows)")
- except Exception as e:
- print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr)
- all_valid = False
+ try:
+ actual_count = -1
+ with open(csv_path, "r", newline="") as f:
+ reader = csv.reader(f)
+ # Count rows without loading entire file into memory
+ for i, _ in enumerate(reader):
+ actual_count = i
+
+ if actual_count != row_count:
+ print(
+ f"❌ Row count mismatch for '{db_key}': manifest={row_count}, actual={actual_count}",
+ file=sys.stderr,
+ )
+ all_valid = False
+ else:
+ print(f"✅ CSV verified: {db_key} ({actual_count} rows)")
+ except (OSError, csv.Error) as e:
+ print(f"❌ Error reading CSV '{csv_path}': {e}", file=sys.stderr)
+ all_valid = FalseThis change:
- Makes row-count mismatches fail verification (sets
all_valid = False). - Avoids loading entire CSVs into memory via streaming row count.
- Narrows exception handling to
OSError(file I/O) andcsv.Error(CSV parsing) instead of catching all exceptions.
🧰 Tools
🪛 Ruff (0.14.7)
81-81: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In scripts/verify_bundle.py around lines 51 to 85, the function verify_csv_files
treats CSV row-count mismatches as warnings and catches all exceptions; update
it so mismatched row counts set all_valid = False, and count rows by streaming
the CSV (iterate the reader and increment a counter, subtract 1 for the header)
instead of loading the whole file into memory; narrow the exception handling to
only OSError and csv.Error and keep printing the same error message while still
setting all_valid = False on read/parse failures.
Adds hardened sync workflows for governance, evidence, operations, and property bundles. Includes Python preflight and post-export verification scripts.\n\nWhat’s included:\n- Four reusable workflows: sync-*-bundle\n- Preflight validator: scripts/preflight.py\n- Bundle verification: scripts/verify_bundle.py\n- Executable bits and .gitignore\n\nTesting done:\n- Ran preflight to confirm token/placeholder checks trigger as expected\n\nFollow-ups:\n- Set NOTION_TOKEN secret in repo or org\n- Replace placeholder Notion database URLs\n- Wire scheduled triggers if desired