Skip to content

MediaComem/das-public

 
 

Repository files navigation

PLOS Open Science Indicators

DOI

PLOS recently published an innovative dataset of Open Science Indicators (OSI), focused on its entire collection plus a comparison dataset from PubMed. We use here the OSI version 5, containing approximately 124000 PMC and PLOS articles. The OSI is primarily concerned with indicators on: sharing of research data, in particular, data shared in data repositories; sharing of code; and posting of preprints.

The Media Engineering Institute (MEI) has been involved in collecting data from the PubMed Open Access collection to equip the OSI dataset with citation data (article) and h-index data (author level), in preparation for further analysis. The data collection pipeline has been adapted following the process described in the previous work on Data Availability Statements, described below.

Code and data

  • We start from the OSI dataset and the PubMed Central Open Access collection. Our goal is to extract a CSV file containing citation data and h-index data for every article in OSI, calculated from PubMed OA.
  • See the dataset folder for more details on the steps taken:
    • Detect authors in the OSI dataset.
    • Collect all citations given from any article in PubMed OA to any OSI article, using known identifiers contained in the lists of references.
    • Calculate citation counts for 1, 2, and 3 years after the publication of all OSI articles, using month-level precision (e.g., for an article published in June 2019, a 2-year citation window comprises all citations received by articles published until June 2021). Furthermore, calculate the author-level h-index based on the same data.
    • Compute the h-index and timed citation indicators as a dataset that can be joined with the OSI dataset.
    • Develop and run satisfactory tests to ensure the correctness of results. In dataset/dev_set, some articles are added to the previous ones to validate the citation and h_index calculations.
    • The source code has been updated to the latest Python and packages release when necessary.
  • To validate the code, please refer to the testing procedure.
  • The final result can be found in dataset/exports/export_plos.csv.zip.

Modelling and analysis

The code and data for the modelling and analysis can be found in the analysis folder.

Original work

This repository is a fork of previous work that can be found here:

  • DOI
  • Binder

The original code is mentioned in the following papers:

Please add an issue or notify the authors should you find any error to correct or improvements to make. Well-documented pull requests are particularly appreciated.

About

PLOS Open Science Indicators (public)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 92.7%
  • Python 5.8%
  • R 1.5%