Skip to content

Standardise NULL-handling pattern for standalone derived variable functions #173

@DougManuel

Description

@DougManuel

Summary

Derived variable functions in v3 use an ad-hoc NULL→NA vector conversion pattern that should be standardised via a DRY helper. Currently 7 of 33 smoking functions implement the pattern manually; the rest require all inputs (no standalone use).

Current pattern (repeated in each function)

# Handle NULL inputs - convert to NA vectors of appropriate length
if (is.null(SMK_01C) && is.null(SMKG01C_cont)) {
  return(assign_missing("not_stated", "age_first_cigarette", output_format))
}
n <- if (!is.null(SMK_01C)) length(SMK_01C) else length(SMKG01C_cont)
if (is.null(SMK_01C)) SMK_01C <- rep(NA_real_, n)
if (is.null(SMKG01C_cont)) SMKG01C_cont <- rep(NA_real_, n)

Issues to address

  1. DRY helper: Extract the NULL→NA conversion into a reusable function (e.g., prepare_inputs()) that handles vector length detection and NULL expansion. Could live in clean_variables() or as a standalone helper.

  2. Inconsistent missing types: When all inputs are NULL, some functions return not_stated (NA::b), others not_applicable (NA::a). Need a consistent rule — likely: if the variable is structurally absent from the survey (NULL because column doesn't exist), that's a different semantic than respondent-level missingness.

  3. NA::c for survey-level missingness: Consider adding a third missing type (not_collected) for when a variable is NULL because the survey cycle didn't include it. Currently assign_missing() only supports not_applicable (NA::a) and not_stated (NA::b). This would require changes to assign_missing(), get_missing_config(), any_missing(), get_priority_missing(), and worksheet recStart/recEnd patterns.

  4. Coverage: Only 7/33 smoking functions have NULL defaults. All {recommended:primary} functions should support standalone use with NULL inputs. Lower-level pass-through functions may not need it.

Functions with NULL handling (current)

  • calculate_age_start_smoking() — returns not_stated
  • calculate_age_first_cigarette() — returns not_stated
  • calculate_smoked_100_lifetime() — returns not_stated
  • calculate_pack_years() — optional params only
  • calculate_SMKDSTY_cat6()stop() on NULL (different pattern)
  • calculate_smoke_simple()stop() on NULL
  • Cessation _cont functions — optional continuous companion

Functions that should probably have NULL handling

  • calculate_cigs_per_day(){recommended:primary} but no NULL defaults
  • calculate_time_quit_smoking(){recommended:primary} but no NULL defaults
  • assess_quit_pathway(){recommended:secondary}

Context

Discovered during PR #163 (v3 smoking harmonisation) review. The v2 functions had no NULL handling because rec_with_table() always provided all inputs. v3 aims for standalone usability so users can call functions directly outside cchsflow.

Approach

  • Create a helper function with tests
  • Standardise across recommended functions
  • Document the convention in the worksheets skill
  • Consider NA::c as a separate sub-issue if scope is too large

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions