Skip to content

Design Doc: Geographic Representativeness

Tristan Crockett edited this page Dec 16, 2016 · 2 revisions

In order to more widely expose geographic data to Open Skills API users, we want to ensure that our job listing data is geographically representative of the nation's jobs as a whole. We want to be able to view this data on our aggregate dataset at any time, so as we add new data sources to the mix we can ensure continued quality.

To do this, we want to compare our data with survey estimates gathered from government agencies as a ground truth. The form in which we want to compare our data is:

Job counts per ONET SOC code per metro area (CBSA).

Potential government data sources:

We are currently looking at the government data sources to find out which one is best suited for extracting this data. Once we do, we'll pick a time window where there is both sufficient ground truth data and aggregated job posting data, and gauge how much the occupational geographic distributions differ.