R code for calculating a Labour Market Concentration Index using the Herfindahl-Hirschmann Index (HHI) from Online Job Advertisements data (OJA).
The code is based on the work within the ESSnet Big Data II project on Online Job Vacancies.
The main code is contained in the script lmci_v1.R The execution of the code is parallelized to reduce the time needed to process the data for all the 27 countries.
The main functions used in the code are declared in the script hhi_functions.R
The folder Other script contains mainly scripts and data used for the creation and evaluation of the model dealing with company names.
The data used in this study consist of 116 851 363 distinct online jobs ads collected from 316 distinct sources in all EU countries. OJAs refer to advertisements published on the World Wide Web revealing an employer’s interest in recruiting workers with certain characteristics for performing certain work. Employers can publish job ads for various reasons, for example to fill a current vacancy or to explore potential recruiting opportunities.
The OJAs used in this study are available thanks to the cooperation between Eurostat (representing the ESS) and the European Centre for the Development of Vocational Training (CEDEFOP) and their formal agreement for a joint approach to online job advertisement data
The timeliness of the data release has been improving over time, starting with a lag of 7 months between collection and data release, which has been reduced to the current lag of 2 months. The results presented in this document are based on the most recent version of the dataset (i.e. v9) released during the first quarter of 2021. This dataset contains data from the third quarter of 2018 to the end of 2020. However, only data from 2019 has been used in this study, because of its better coverage of important sources. Access to OJA data can be granted to interested users on a case-by-case basis, following a formal request. Please send your enquiries to ESTAT-WIH@ec.europa.eu.
The main R script takes as inputs two .csv files:
- companies_to_clean_EU to clean the company names of the original OJA dataset
- staff_agencies_EU to filter out staffing agencies (i.e. where the variable companyname reports the name of the staffing agency instead of the name of the company that has the actual job post adverstised)
This R code is used to produce the experimental results described in the Statistical Working Paper - Competition in Urban Hiring Markets: Evidence from Online Job Advertisements. The visualizations produced with ggplot are not the same used in the paper, which are reformatted to respect the graphical standards of Eurostat's publications.
Data for all Functional Urban Areas over 2019 and 2020 can be explored in an interactive map.