Skip to content

Data Submission Handout

jbartlewski edited this page Oct 1, 2024 · 21 revisions

Preface

OpenAPC is an Open Data project on Open Access publishing charges, and all data is provided by academic institutions or funders on a voluntary basis. If you are reading this handout because you are considering to contribute data on behalf of your institution for the first time, we would like to thank you in advance - our project could not exist without that kind of dedication! Please take some time to familiarize yourself with the following guidelines. If anything is unclear, do not hesitate to ask (either via the Issue tracker or mail). And most important: Your contribution does not have to be perfect - we have a lot of experience and technical means to fix many issues on our side.

Minimal requirements

  • The data contains an academic institution's expenditures on Open Access publishing on a per-publication basis.
  • The data should be provided in a machine-readable, platform independent format (CSV).
  • The data is provided under an Open Data Commons license to ensure public access and reusability.
  • A contact person is designated at the contributing institution.

Types of cost data

OpenAPC collects cost data on OA publishing for the following publication types:

  1. Journal articles (Article Processing Charges, APCs)
  2. Monographs/full books (Book Processing Charges, BPCs)

OA charges for other publication types (like single book chapters or conference proceedings) are not collected at the moment.

For journal articles, you can report additional costs alongside APCs, such as colour, submission or page charges that may be associated with Open Access publications.

Data sets

OpenAPC maintains different data sets for accepted publication types, they are composed of all the contributing institutions' distributed tables. Journal/book titles and publisher names are imported from CrossRef via automated enrichment routines to make expenditures comparable. Additional metadata is collected from services like Europe PubMed Central, the DOAJ or the DOAB.

The data is made available on GitHub.

Data schema

Your submitted CSV file should conform to a certain data schema to ensure it provides all the information we need. Every schema field is represented by a table column and every publication record conforms to a table row. The schema to use depends on the type of cost data you plan to submit:

If you want to provide cost data on both publication types, we recommend to submit two different tables.

This contribution from Leipzig University is an example of a table which conforms to the data schema for articles.

Definition of costs

Open Access publishing charges

The amount reported in the euro field should be calculated according to the following policy:

  • All reported publication fees are gross values, modifiers like taxes or discounts should always be included into the amount. Except for the mandatory backlist_oa field in the BPC data set, OpenAPC does not explicitly track special circumstances which might influence prices. However, institutions are encouraged to report details on such circumstances in a README file which can be added to their individual data folders (Example).
  • Only the APC/BPC itself should be reported in this field, no additional matters of expense like page/colour charges or submission fees. Additional costs can be reported in separate data fields (see "APC Additional Costs data set").
  • If costs for a publication were split between multiple institutions, only one of them should report the full amount to OpenAPC.
  • Some journals do levy additional fees for corrections to published articles (corrigenda). Such expenditures are not part of the APC and thus should neither be added to the reported costs nor added to the data table as as separate entry (in case a DOI was assigned to the corrigendum).
  • Only publications which conform to a "standard" APC/BPC model should be reported (Direct payment of money for OA publication). Special rules apply to publications published under transformative agreements (see "Data from transformative agreements").
  • The cost should not be zero.

As the field name implies, the currency of the reported amount should be Euro (€). If your institution's accounting is based on another currency you can either convert the values yourself (preferable) or add the denomination and leave the process to us. However, in this case results might be slightly inaccurate as we will have to work with average exchange rates for the reported period. If you have information on the exact date of payment for each article you might want to add this information to the period column (YYYY-MM-DD) instead of just the year so we can apply exchange rates on a daily basis.

Additional costs

In 2024, OpenAPC has started to record additional costs for journal articles that may be incurred in addition to APCs. Please note the following points when reporting these costs:

  • Additional cost items should not be included in the Euro data field, but should be reported in separate data fields.
  • The different types of costs that can be reported are based on the definitions of the openCost metadata format: colour charge, cover charge, page charge, permission, reprint, submission fee, payment fee, other. More information on the openCost schema can be found at https://github.com/opencost-de/opencost/tree/main/doc.
  • As with APCs, additional costs are calculated on a gross basis, i.e. the amounts should include factors such as taxes.
  • These cost types are considered optional and are therefore recorded in a separate data set linked to the main entry of the publication in the APC data set using the DOI as the primary key.

Data on Transformative Agreements

Articles published under transformative agreements are added to our Transformative Agreements (TA) data set. There are some specifics to be noted here:

  • Cost and payment modalities can differ a lot, therefore most records in the data set do not include cost information and the Euro field is not mandatory within the TA schema.
  • For the TA data set, we include only large data submissions provided directly by funders or consortia. Single institutions should not report publications from transformative agreements to OpenAPC.
  • Data from the German DEAL agreements with Wiley and Springer Nature is an important exception, as it is provided by participating institutions. Costs per article in hybrid journals are calculated individually.

Submission

There are two ways to provide OpenAPC with your data:

  1. By sending a mail to openapc at uni-bielefeld.de
  2. By initiating a pull request on GitHub. This process is described in detail below.

IMPORTANT NOTE: We kindly ask you to send us data for first-time submissions only by e-mail (submission path 1), so that we can immediately add an official e-mail address as well as a contact person to your profile. The submission by pull request can be used from the second submission on, however, you are of course very welcome to continue submitting data by e-mail.

GitHub workflow: Submitting new data

To add new data to GitHub by yourself, the following steps are required. The instructions require the use of a shell (command line), for Windows it is recommended to use Git for Windows. Git Bash is included and provides a proper environment with git pre-installed.

  1. Create an account on GitHub (free of charge), if not already existent. The login name will be referred to as YOURUsername in the following. For the following steps, it is also necessary to create an SSH key and add it to your GitHub account, which is explained here.

  2. Create a fork of the OpenAPC repository in your user account.

  3. Create a local copy (clone) of the fork on your computer: $ git clone https://github.com/YOURUsername/openapc-de.git

  4. Search for the folder of your institution in the data in the datasubdirectory (in the following YOURFolder).

  5. Copy the files you want to add into the folder.

  6. Add the new data to git and then push it to your fork on GitHub:

$ git add openapc-de/data/YOURFolder/
$ git commit -m "APC fees paid in 2022" (or a similar description)
$ git push origin master
  1. Create a pull request to add the data to the original OpenAPC repository. An OpenAPC staff member will merge the pull request as soon as possible. Once the data is processed, you will be notified by email.

Enrichment

After receiving your files (Either by pull request or by mail), OpenAPC will normalise your data and enrich it. For every contributed data file an enriched version will be created and added to your data folder, usually marked by adding an _enriched suffix to the file name. The enriched data will then be integrated into the OpenAPC core data set, increasing its revision number.

The enrichment process consists of the following steps:

  • Journal and publisher names, ISSNs and license information are imported from CrossRef
  • PMID and PMCID are imported from Europe PubMed Central
  • The article is looked up in Web of Science, if found, the WoS identifier ut is stored
  • The journal is looked up in the DOAJ
  • A possible Linking ISSN (ISSN-l) is added

License

At the moment the following license is applied to all OpenAPC data:

Datasets are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/

Contributors

All contributors to OpenAPC will be mentioned by name.

Reuse

In addition to the dynamically generated repository front page (based on R Markdown) OpenAPC also operates an OLAP server for advanced data querying and a site providing treemap visualisations of the OpenAPC data.

Clone this wiki locally