Parse CPS dat files using file representation of column mapping (by parsing SAS script or data dictionary)

The 700+ lines of hard-coding like below, in each separate CPS loading script...TBH it scares me a bit.

https://github.com/PSLmodels/taxdata/blob/7fa2634d578c72309234a629a0f72d679c33a086/cps_data/pycps/cpsmar2013.py#L17-L19

This seems error-prone and tedious, and slows down the process of adding new CPS data.

Another option would be using the ASEC data dictionaries published by Census, e.g. [this from 2018](https://www2.census.gov/programs-surveys/cps/datasets/2018/march/08ASEC2018_Data_Dict_Full.txt). These could be parsed to create dictionaries for each variable corresponding to a start and end position, e.g.
```
{'HRECORD': [1,1],
 'HSEQ': [2,6]
...
}
```
then use this dictionary to loop through lines to create records a `DataFrame`. I'm not a regex expert but I don't think extracting the dict from the data dictionary seems too hard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse CPS dat files using file representation of column mapping (by parsing SAS script or data dictionary) #342

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	record["hrecord"] = int(rec[0:1])
	record["h_seq"] = int(rec[1:6])
	record["hhpos"] = int(rec[6:8])

Parse CPS dat files using file representation of column mapping (by parsing SAS script or data dictionary) #342

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions