-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Summary
Derived variable functions in v3 use an ad-hoc NULL→NA vector conversion pattern that should be standardised via a DRY helper. Currently 7 of 33 smoking functions implement the pattern manually; the rest require all inputs (no standalone use).
Current pattern (repeated in each function)
# Handle NULL inputs - convert to NA vectors of appropriate length
if (is.null(SMK_01C) && is.null(SMKG01C_cont)) {
return(assign_missing("not_stated", "age_first_cigarette", output_format))
}
n <- if (!is.null(SMK_01C)) length(SMK_01C) else length(SMKG01C_cont)
if (is.null(SMK_01C)) SMK_01C <- rep(NA_real_, n)
if (is.null(SMKG01C_cont)) SMKG01C_cont <- rep(NA_real_, n)Issues to address
-
DRY helper: Extract the NULL→NA conversion into a reusable function (e.g.,
prepare_inputs()) that handles vector length detection and NULL expansion. Could live inclean_variables()or as a standalone helper. -
Inconsistent missing types: When all inputs are NULL, some functions return
not_stated(NA::b), othersnot_applicable(NA::a). Need a consistent rule — likely: if the variable is structurally absent from the survey (NULL because column doesn't exist), that's a different semantic than respondent-level missingness. -
NA::c for survey-level missingness: Consider adding a third missing type (
not_collected) for when a variable is NULL because the survey cycle didn't include it. Currentlyassign_missing()only supportsnot_applicable(NA::a) andnot_stated(NA::b). This would require changes toassign_missing(),get_missing_config(),any_missing(),get_priority_missing(), and worksheetrecStart/recEndpatterns. -
Coverage: Only 7/33 smoking functions have NULL defaults. All
{recommended:primary}functions should support standalone use with NULL inputs. Lower-level pass-through functions may not need it.
Functions with NULL handling (current)
calculate_age_start_smoking()— returnsnot_statedcalculate_age_first_cigarette()— returnsnot_statedcalculate_smoked_100_lifetime()— returnsnot_statedcalculate_pack_years()— optional params onlycalculate_SMKDSTY_cat6()—stop()on NULL (different pattern)calculate_smoke_simple()—stop()on NULL- Cessation
_contfunctions — optional continuous companion
Functions that should probably have NULL handling
calculate_cigs_per_day()—{recommended:primary}but no NULL defaultscalculate_time_quit_smoking()—{recommended:primary}but no NULL defaultsassess_quit_pathway()—{recommended:secondary}
Context
Discovered during PR #163 (v3 smoking harmonisation) review. The v2 functions had no NULL handling because rec_with_table() always provided all inputs. v3 aims for standalone usability so users can call functions directly outside cchsflow.
Approach
- Create a helper function with tests
- Standardise across recommended functions
- Document the convention in the worksheets skill
- Consider NA::c as a separate sub-issue if scope is too large