Skip to content

Comments

Add parse_variables_sheet function with validation#85

Merged
yulric merged 1 commit intodevfrom
sheet-validation
Dec 30, 2025
Merged

Add parse_variables_sheet function with validation#85
yulric merged 1 commit intodevfrom
sheet-validation

Conversation

@yulric
Copy link
Collaborator

@yulric yulric commented Oct 20, 2025

Summary

This PR implements the parse_variables_sheet() function that validates variables sheets to ensure derived variables follow proper dependency rules.

Key Features

  • Validation Logic: Ensures derived variables do not reference database columns directly (e.g., DerivedVar::[cchs::age] is invalid)
  • Allowed Patterns:
    • Derived variables can use non-derived variables: DerivedVar::[age, sex]
    • Derived variables can use other derived variables: DerivedVar::[var1, var2]
    • Non-derived variables can use database columns: cchs::age or [age]
  • Return Values:
    • Success: Returns input with S3 class variables_sheet assigned
    • Failure: Returns structured error list with success = FALSE and detailed error information
  • Error Handling: Collects all validation errors (not just the first one), with each error containing type, row number, and descriptive message

Implementation Details

New Files:

  • R/parse-variables-sheet.R - Main function and helper functions
  • tests/testthat/test-parse-variables-sheet.R - Comprehensive test suite

Helper Functions:

  • .validate_variables_sheet_input() - Input validation (data frame, required columns)
  • .validate_derived_variables() - Core validation logic for derived variables
  • .extract_feeder_variables() - Extracts variables from DerivedVar::[...] syntax
  • .is_database_column_reference() - Detects database::column pattern

Test Coverage

Comprehensive test suite with 12 test cases (6a through 6l):

  • ✅ Non-derived variables validation
  • ✅ Derived variables using valid dependencies
  • ✅ Detection of invalid database column references
  • ✅ Multiple error collection
  • ✅ Empty data frames and DerivedVar syntax
  • ✅ Input validation (missing columns, invalid types)
  • ✅ Real-world data validation (pbc_variables.csv)

All 30 tests passing

Test plan

  • Run all tests with rig run -r 4.3.2 -e "devtools::test(filter = 'parse-variables-sheet')"
  • Verify all 30 tests pass
  • Update NAMESPACE via devtools::document()
  • Verify function works with real-world data (pbc_variables.csv)

🤖 Generated with Claude Code

@yulric yulric changed the base branch from main to dev October 23, 2025 15:06
@yulric yulric merged commit b87e6bd into dev Dec 30, 2025
@yulric yulric deleted the sheet-validation branch December 30, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant