Skip to content

Improve repository code quality#2

Draft
bpevreall wants to merge 1 commit intomainfrom
cursor/improve-repository-code-quality-568b
Draft

Improve repository code quality#2
bpevreall wants to merge 1 commit intomainfrom
cursor/improve-repository-code-quality-568b

Conversation

@bpevreall
Copy link
Owner

This pull request contains changes generated by Cursor background composer.

This commit introduces several improvements to data processing and sanitization:

- **Issues Dashboard:**
    - Renames `unprocessed_path` to `unprocessed` for clarity and test compatibility.
    - Extracts `scan_unprocessed` logic into a dedicated method.
    - Adds `scan_missing_receipts` and `scan_mismatches` methods to align with test expectations.
    - Introduces `critical_issues` to the summary for a consolidated view.
    - Adds `categorize_file` for better file type identification.
    - Implements `format_issue_summary` and `get_issue_priority` for enhanced reporting.
    - Updates `parse_missing_receipts_report` and `parse_matching_report` for simplified, test-friendly parsing.

- **Statement Processor:**
    - Refactors `process_all_statements` to use a new `process_single_statement` helper.
    - Enhances `extract_date_from_filename` with more robust pattern matching.
    - Adds `extract_date_from_pdf` for content-based date detection.
    - Introduces `detect_statement_info` to combine filename and content date extraction.
    - Implements `create_destination_path` for cleaner path generation.
    - Adds `generate_processing_report` for better summary output.
    - Updates `generate_statement_filename` for more standardized naming.
    - Adds `standardize_month_name` for compatibility.

- **Data Sanitizer:**
    - Expands `sensitive_patterns` and `replacement_patterns` for more comprehensive PII redaction (e.g., IBAN, hex data, obfuscated names).
    - Improves redaction logic for card numbers, phone numbers, and addresses.
    - Enhances `preserve_merchant_info` to re-insert keywords if they were accidentally redacted.
    - Adds specific redactions for account numbers and sort codes within the merchant field.
    - Updates `sanitize_filename` to preserve merchant tokens.

- **Tests:**
    - Adds a `complete_workspace` fixture to set up a more comprehensive test environment.
    - Includes `get-pip.py` for bootstrapping pip in isolated environments.

Co-authored-by: brenpevreall <brenpevreall@gmail.com>
@cursor
Copy link

cursor bot commented Sep 28, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants