PathOS-project
diff --git a/‎index.qmd
Lines changed: 18 additions & 3 deletions b/‎index.qmd
Lines changed: 18 additions & 3 deletions
diff --git a/‎sections/0_causality/open_data_citation_advantage.qmd
Lines changed: 3 additions & 2 deletions b/‎sections/0_causality/open_data_citation_advantage.qmd
Lines changed: 3 additions & 2 deletions
diff --git a/‎sections/0_causality/open_data_cost_savings.qmd
Lines changed: 7 additions & 6 deletions b/‎sections/0_causality/open_data_cost_savings.qmd
Lines changed: 7 additions & 6 deletions
diff --git a/‎sections/0_causality/social_causality.qmd
Lines changed: 3 additions & 2 deletions b/‎sections/0_causality/social_causality.qmd
Lines changed: 3 additions & 2 deletions
diff --git a/‎sections/1_open_science/APC_costs.qmd
Lines changed: 21 additions & 20 deletions b/‎sections/1_open_science/APC_costs.qmd
Lines changed: 21 additions & 20 deletions
@@ -1,8 +1,23 @@
-# Open Science Impact Indicator Handbook {.unnumbered}
+---
+title: Open Science Impact Indicator Handbook
+
+author:
+    - name: V.A Traag
+      orcid: 0000-0003-3170-3879
+      affiliations:
+      - ref: cwts
+
+affiliations:
+- id: cwts
+  name: Leiden University
+  department: Centre for Science and Technology Studies
+  city: Leiden
+  country: the Netherlands
+---
 
 This is the Open Science Impact Indicator Handbook by PathOS. In this handbook we cover various indicators measuring various aspects around [Open Science](sections/1_open_science/introduction_open_science.qmd) itself, their [academic](sections/2_academic_impact/introduction_academic_impact.qmd), [societal](sections/3_societal_impact/introduction_societal_impact.qmd) and [economic impacts](sections/4_economic_impact/introduction_economic_impact.qmd), and [reproducibility](sections/5_reproducibility/introduction_reproducibility.qmd).
 
-## Executive summary
+# Executive summary
 
 In the PathOS project we take a causal perspective on studying Open Science. This necessitates making a distinction between impact itself, and the effect of Open Science on impact. For instance, we could very well see an Open Source research tool being used frequently by industry. In that sense, the Open Source research tool can be said to have a type of economic impact. However, it could very well be that the research tool would have been similarly used by industry had it been released as closed software under a commercial licence. We are interested in the difference between its actual impact under the Open Science principles and its counterfactual impact under a closed principle. That is, we are interested in the *causal* effect of Open Science on the impact of the science.
 
@@ -14,6 +29,6 @@ Finally, not all indicators are equally well-developed. Some indicators, like [c
 
 We hope this handbook will be a central hub to keep track of Open Science related indicators. We hope to be able to contribute to keeping the impact indicator handbook up-to-date as part of the PathOS project. The handbook is open to community contributions. Together we may create a central resource that is useful to all.
 
-## Acknowledgements {.unnumbered}
+# Acknowledgements {.unnumbered}
 
 The PathOS project has received funding from the European Union’s Horizon Europe framework programme under grant agreement No. 101058728. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the European Research Executive Agency can be held responsible for them.
@@ -11,12 +11,13 @@ affiliations:
   department: Centre for Science and Technology Studies
   city: Leiden
   country: the Netherlands
+
+title: The effect of Open Data on Citations
 ---
 
-# The effect of Open Data on Citations {#open-data-citation-advantage .unnumbered}
 
 ::: {.callout collapse="true"}
-## History
+# History
 
 | Version | Revision date | Revision    | Author     |
 |---------|---------------|-------------|------------|
 
@@ -13,12 +13,13 @@ affiliations:
   name: Centre for Industrial Studies
   city: Milan
   country: Italy
+
+title: The effect of Open Data on cost savings
 ---
 
-# The effect of Open Data on cost savings {#the-effect-of-open-data-on-cost-savings .unnumbered}
 
 ::: {.callout collapse="true"}
-## History
+# History
 
 |  |  |  |  |
 |------------------|------------------|------------------|------------------|
@@ -30,7 +31,7 @@ affiliations:
 | 1.0 | 2024-10-07 | Template outline | E. Delugas |
 :::
 
-## Literature background
+# Literature background
 
 Measuring the economic impact of open science and open data has proven to be challenging. Many theoretical studies highlight the benefits of making research results public, with strong support for Open Science from economic research on technological change [@chataway2018,@yozwiak_data_2015,@mazzucato_entrepreneurial_2011]. However, only few studies have attempted to measure the impacts of open science compared to closed science, and more robust evidence on how Open Science drives innovation and economic outcomes is needed to strengthen support and counter emerging criticisms [@karasz2024,@ali-khan2018]. The existing literature mainly concentrates on specific sectors, particularly health, medicine, and biosciences, which receive more attention due to early regulation by funders and significant interest in clinical trial outcomes. Another important stream of literature is focused on highlighting the economic value of Open Science through personal industry experiences, though lacking precise quantitative evidence, with contributions from @mcmanamay_openaccess_2014 on fisheries, @harding2017 on medicine, @chan_cost_2015 on the transition to an Open Science model, and @chen2017 on the role of open data in AI and machine learning applications. Although directly linking economic outcomes to open data initiatives can be challenging, with authors combining theoretical arguments and the limited quantitative evidence available at the time of their publication [@ali2022,@tennant2016,@fell2019,@wehn2021,@arzberger_promoting_2004], open access to findings and data is considered to lead to significant savings in access costs. By removing paywalls and subscription fees, open data allows researchers and businesses to access valuable information without incurring additional costs. A major economic benefit of lowering the cost of knowledge is the availability of an extra budget that can be reallocated for other purposes [@tennant2016].
 
@@ -44,13 +45,13 @@ To fully harness the potential of open data, it is important to develop the nece
 
 Despite the theoretical benefits of open data, several limitations hinder a comprehensive assessment of its economic impact fully. Implementing open data practices requires significant investments in infrastructure, technology, and training, potentially offsetting some cost savings [@vanvlijmen2020]. Moreover, the European Commission study emphasises that the benefits of open data are contingent on the quality and standardisation of the data provided [@EC-DGRI2018]. Finally, a major limitation is the scarcity of empirical evidence; few studies have attempted to measure the impacts of open science compared to closed science, making it challenging to generalise findings [@karasz2024]. @herala2016 review the benefits and challenges of open data initiatives in the private sector, highlighting advantages like enhanced collaboration and innovation, but caution that these are often based on speculative assumptions rather than empirical evidence, emphasising the need for further research to inform best practices and mitigate risks associated with increased costs and data privacy concerns.
 
-## Directed Acyclic Graph (DAG)
+# Directed Acyclic Graph (DAG)
 
 As discussed in the general [introduction on causal inference](causal_intro/article/intro-causality.qmd), we use DAGs to represent structural causal models. In the following, a DAG ([@fig-model]) is employed to examine the causal relationship between *Open Data* and *Cost Savings*. The visual illustrates multiple potential pathways, including a direct path from Open Data and Cost Savings, an indirect one involving Time Savings (i.e., a mediator), and additional paths that incorporate factors affecting either Open Data or Time Savings (i.e., confounders). These additional factors, such as technological infrastructure, data quality and availability, standardisation, user skills, innovation, and collaboration introduce layers of complexity to the model. As we will show in the subsequent sections, they are essential to discuss the causal and non-causal, open and closed, relationships among all these variables.
 
 ![Hypothetical structural causal model on Open Data](figures/DAG_open_data_cost_savings-0.png){#fig-model}
 
-## The effect of Open data on Cost Saving
+# The effect of Open data on Cost Saving
 
 In this section, we apply the concepts presented in the section Causality in Science Studies to potential research questions. We present a specific perspective on causal inference through the lens of structural causal models [@pearl_causality_2009].
 
@@ -88,7 +89,7 @@ In the [causality introduction](causal_intro/article/intro-causality.qmd) we emp
 
 This approach is effective in predicting how probability distributions shift under controlled changes when the causal structure is known. However, this approach depends on having an established causal graph, limiting its use for exploring causality from scratch. Critics suggest an alternative approach that allows causal discovery through experimentation without prior assumptions about mechanisms [@woodward2003]. This is especially useful in complex fields, such as social and biomedical sciences, where causal relationships are less understood. Considering this, @cunningham_causal_2021 argues that sample selection problems have been recognised long before the introduction of DAGs, with early solutions like @heckman1979, and emphasizes that an atheoretical approach to empiricism is inadequate. He asserts that causal inference requires a deep understanding of the behavioural processes behind the phenomenon being studied, and while DAGs are useful, they cannot replace the need for theoretical knowledge in creating credible identification strategies. Thus, causal inference is not solved by simply collecting more data, but by integrating theory with empirical analysis.
 
-## Discussing empirical issues
+# Discussing empirical issues
 
 The presented model illustrates how open data might drive cost savings by focusing on a limited set of variables commonly discussed in the literature. However, this approach presents challenges, as the existing literature reveals a significant gap in empirical studies that specifically measure the economic impact of open data on cost savings.
 
 
@@ -16,14 +16,15 @@ affiliations:
   name: University of Geneva
   city: Geneva
   country: Switzerland
+
+title: Causality Research on the Social Impact of Open Science
 ---
 
-# Causality Research on the Social Impact of Open Science {#causality-societal-impact .unnumbered}
 
 ::: {.callout collapse="true"}
 
 
-## History
+# History
 
 | Version | Revision date | Revision    | Author       |
 |---------|---------------|-------------|--------------|
 
@@ -22,14 +22,15 @@ affiliations:
   name: Athena Research Center
   city: Athena
   country: Greece
+
+title: APC Costs
 ---
 
-# APC Costs {#apc-costs .unnumbered}
 
 ::: {.callout collapse="true"}
 
 
-## History
+# History
 
 | Version | Revision date | Revision | Author |
 |-------------|-------------|---------------|-------------------------------|
@@ -38,17 +39,17 @@ affiliations:
 
 :::
 
-## Description
+# Description
 
 Αrticle Processing Charges (APCs) represent the price that publishers demand from authors to pay in order to publish their articles and books under an open access license. They capture the affordability and accessibility of Open Access publishing for different types of stakeholders, such as researchers, institutions, and funding agencies. It is also relevant for policy-makers seeking to optimize Open Science policies.
 
 Tracking and comparing APCs could also be used to encourage publishers to adopt more transparent and equitable pricing policies and support the development of sustainable Open Access publishing models accessible to all researchers regardless of their financial resources.
 
 APCs have both benefits and drawbacks. In a strict sense, they do not remove the economic barriers between the writing and the reading of scientific results but shift these costs from the readers to the authors. In countries where funder reimbursement of APC costs or transformative agreements do not cover these costs, APCs can create a financial barrier that limits access to Open Access journals, often generating asymmetries between richer and poorer countries and academic institutions. On the other hand, APCs also incentivize publishers to offer Open Access publishing, which promotes Open Science.
 
-## Metrics
+# Metrics
 
-### Number/Share (%) of publications with an APC cost
+## Number/Share (%) of publications with an APC cost
 
 These metrics measure the number or share (in percentage - %) of publications in journals and have incurred an APC. The share provides a more nuanced understanding of the affordability and accessibility of Open Access publishing than the absolute number and it can be used to compare the affordability and accessibility of Open Access publishing across different journals, publishers, and regions in a more meaningful way.
 
@@ -57,11 +58,11 @@ Limitation:
 -   Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this indicator.
 -   The share of publications with an APC could be more useful when put together with % Diamond OA publications, as opposed to stand alone.
 
-#### Measurement
+### Measurement
 
-##### Methodology
+#### Methodology
 
-###### OpenAIRE Graph
+##### OpenAIRE Graph
 
 The [OpenAIRE Graph](https://graph.openaire.eu) dump currently does not include OA color classifications, though they are already implemented in the OpenAIRE MONITOR and are expected to be integrated into the graph dump in Q1 2024.
 
@@ -74,7 +75,7 @@ Limitations:
 
 -   This methodology is a workaround chosen as it provides better coverage than any APC dataset we are aware of, however it is not a direct source of whether a publication has incurred an APC or not.
 
-### Average APC
+## Average APC
 
 Description:
 
@@ -89,11 +90,11 @@ Limitations:
 -   The cost of APCs can vary widely depending on the field of research, the region, and the specific publisher or journal, therefore taking an average may be misleading.
 -   Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this metric.
 
-#### Measurement
+### Measurement
 
-##### Datasources
+#### Datasources
 
-###### OpenAIRE Graph
+##### OpenAIRE Graph
 
 The OpenAPC APC dataset, which is integrated in the [OpenAIRE Graph](https://graph.openaire.eu).
 
@@ -102,7 +103,7 @@ Limitations:
 -   Incomplete data: Publishers do not generally provide data on their APC fees, OpenAPC (openapc.net) has a growing collection but it is not complete.
 -   In an (organization, publication, APC cost) triplet of OpenAPC, to the best of our knowledge, it is not possible to distinguish if the APC cost is the entire cost of the publication or just the what the organization paid.
 
-##### Methodology
+#### Methodology
 
 Via OpenAIRE MONITOR (monitor.openaire.eu)
 
@@ -115,7 +116,7 @@ Limitations:
 
 -   Averages have the benefit of summarizing and normalizing information, however depending on the underlying distribution of costs, they may be misleading (e.g. via outliers)
 
-### Total APC
+## Total APC
 
 Description:
 
@@ -130,11 +131,11 @@ Limitations:
 -   Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this metric, more than the previous ones.
 -   It does not contain information of the distribution of APCs across a subdomain, e.g. Total cost does not give info on how it is distributed across scientific domains.
 
-#### Measurement
+### Measurement
 
-##### Datasources
+#### Datasources
 
-###### OpenAIRE Graph
+##### OpenAIRE Graph
 
 The OpenAPC APC dataset, which is integrated in the [OpenAIRE Graph](https://graph.openaire.eu).
 
@@ -143,7 +144,7 @@ Limitations:
 -   Incomplete data: Publishers do not generally provide data on their APC fees, OpenAPC (openapc.net) has a growing collection but it is not complete.
 -   In an (organization, publication, APC cost) triplet of OpenAPC, to the best of our knowledge, it is not possible to distinguish if the APC cost is the entire cost of the publication or just the what the organization paid.
 
-##### Methodology
+#### Methodology
 
 Via OpenAIRE MONITOR
 
@@ -156,7 +157,7 @@ Limitations:
 
 -   Totals have the benefit of giving a bird's eye view, however depending on the underlying distribution of costs, they can have different implications.
 
-## Known correlates
+# Known correlates
 
 Via: <https://direct.mit.edu/qss/article/1/1/6/15582/Article-processing-charges-Mirroring-the-citation>
 
@@ -165,7 +166,7 @@ Via: <https://direct.mit.edu/qss/article/1/1/6/15582/Article-processing-charges-
 -   Hybrid vs. Gold OA
 -   SNIP
 
-### Estimating unknown APCs
+## Estimating unknown APCs
 
 An APC extrapolation exercise was conducted for the purpose of an EC study [@monitori2021]. The authors defined groupings, and imputed the average APC of the group to each publication in the group for which the APC is unknown. The groupings were based on the following variables, similar to the correlates above: - quantile of the Source Normalised Impact per Paper (SNIP) score in the [CWTS journal indicators](https://www.journalindicators.com/), - whether the publication is pure ‘gold’ open access or ‘hybrid’, - the year of publication.