The 700+ lines of hard-coding like below, in each separate CPS loading script...TBH it scares me a bit.
|
record["hrecord"] = int(rec[0:1]) |
|
record["h_seq"] = int(rec[1:6]) |
|
record["hhpos"] = int(rec[6:8]) |
This seems error-prone and tedious, and slows down the process of adding new CPS data.
Another option would be using the ASEC data dictionaries published by Census, e.g. this from 2018. These could be parsed to create dictionaries for each variable corresponding to a start and end position, e.g.
{'HRECORD': [1,1],
'HSEQ': [2,6]
...
}
then use this dictionary to loop through lines to create records a DataFrame. I'm not a regex expert but I don't think extracting the dict from the data dictionary seems too hard.