Skip to content

Comments

Education#168

Open
caitlink12 wants to merge 4 commits intofeature/v3.0.0-validation-infrastructurefrom
education
Open

Education#168
caitlink12 wants to merge 4 commits intofeature/v3.0.0-validation-infrastructurefrom
education

Conversation

@caitlink12
Copy link

Included ICES survey cycles to existing education variable EDUDR04 to variable and variable_details.

included ICES survey cycles to existing EDUDR04 variable and updated suffixes from _i to _m
@DougManuel
Copy link
Contributor

Review: PR #168 (Education)

Reviewed EDUDR03 (3-level) and EDUDR04 (4-level) education variables. The PR adds ICES master databases and correctly routes EHG2DVR3 (2015+) to EDUDR03 only. Source variable names verified against MCP metadata database.

Full review details in CEP-013 (ceps/cep-013-education/).

Fixes applied

  1. Trailing empty columns removed (19 extra columns in variable_details.csv header, from 22 to 41 columns). Likely from Excel editing.
  2. _NA::a_NAa / _NA::b_NAb in dummyVariable (9 rows across EDUDR03 and EDUDR04): Colons are invalid in identifiers.

L6 integration results

  • EDUDR03: All 9 PUMF cycles pass. 2001-2014 collapse 4-level EDUDR04 source to 3 categories ([3,4]→3). 2015-2018 direct 1:1 mapping from EHG2DVR3.
  • EDUDR04: 7 cycles pass (2001-2014). 2015-2016 and 2017-2018 correctly MISS — EHG2DVR3 is 3-level and doesn't map to the 4-level variable.

Design note

For 2015+, CCHS replaced the 4-level EDUDR04 with 3-level EHG2DVR3. The worksheets correctly route this to EDUDR03 (not EDUDR04). MCP metadata has EHG2DVR3.cchsflow_name = EDUDR04, which appears to be an incorrect mapping in the metadata database.

Pre-existing issues (not introduced by this PR)

  • cchs2014_m not referenced anywhere in the project (0 variables use it)
  • Base file already had trailing comma formatting debt (3296 rows)

Recommendation

PR is good to merge after the fixes are applied. Close and delete the branch after merge.

- Replace _NA::a/_NA::b with _NAa/_NAb in EDUDR03/EDUDR04 dummyVariable (9 rows)
- Remove 19 extra empty columns from variable_details.csv header (41→22 columns)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants