Skip to content

Parse CPS dat files using file representation of column mapping (by parsing SAS script or data dictionary) #342

@MaxGhenis

Description

@MaxGhenis

The 700+ lines of hard-coding like below, in each separate CPS loading script...TBH it scares me a bit.

record["hrecord"] = int(rec[0:1])
record["h_seq"] = int(rec[1:6])
record["hhpos"] = int(rec[6:8])

This seems error-prone and tedious, and slows down the process of adding new CPS data.

Another option would be using the ASEC data dictionaries published by Census, e.g. this from 2018. These could be parsed to create dictionaries for each variable corresponding to a start and end position, e.g.

{'HRECORD': [1,1],
 'HSEQ': [2,6]
...
}

then use this dictionary to loop through lines to create records a DataFrame. I'm not a regex expert but I don't think extracting the dict from the data dictionary seems too hard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions