Skip to content

Commit 5383229

Browse files
authored
Merge pull request #82 from vtraag/change-title
Move titles in yaml header
2 parents bb77ad8 + 75e2ea5 commit 5383229

File tree

60 files changed

+875
-804
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+875
-804
lines changed

index.qmd

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,23 @@
1-
# Open Science Impact Indicator Handbook {.unnumbered}
1+
---
2+
title: Open Science Impact Indicator Handbook
3+
4+
author:
5+
- name: V.A Traag
6+
orcid: 0000-0003-3170-3879
7+
affiliations:
8+
- ref: cwts
9+
10+
affiliations:
11+
- id: cwts
12+
name: Leiden University
13+
department: Centre for Science and Technology Studies
14+
city: Leiden
15+
country: the Netherlands
16+
---
217

318
This is the Open Science Impact Indicator Handbook by PathOS. In this handbook we cover various indicators measuring various aspects around [Open Science](sections/1_open_science/introduction_open_science.qmd) itself, their [academic](sections/2_academic_impact/introduction_academic_impact.qmd), [societal](sections/3_societal_impact/introduction_societal_impact.qmd) and [economic impacts](sections/4_economic_impact/introduction_economic_impact.qmd), and [reproducibility](sections/5_reproducibility/introduction_reproducibility.qmd).
419

5-
## Executive summary
20+
# Executive summary
621

722
In the PathOS project we take a causal perspective on studying Open Science. This necessitates making a distinction between impact itself, and the effect of Open Science on impact. For instance, we could very well see an Open Source research tool being used frequently by industry. In that sense, the Open Source research tool can be said to have a type of economic impact. However, it could very well be that the research tool would have been similarly used by industry had it been released as closed software under a commercial licence. We are interested in the difference between its actual impact under the Open Science principles and its counterfactual impact under a closed principle. That is, we are interested in the *causal* effect of Open Science on the impact of the science.
823

@@ -14,6 +29,6 @@ Finally, not all indicators are equally well-developed. Some indicators, like [c
1429

1530
We hope this handbook will be a central hub to keep track of Open Science related indicators. We hope to be able to contribute to keeping the impact indicator handbook up-to-date as part of the PathOS project. The handbook is open to community contributions. Together we may create a central resource that is useful to all.
1631

17-
## Acknowledgements {.unnumbered}
32+
# Acknowledgements {.unnumbered}
1833

1934
The PathOS project has received funding from the European Union’s Horizon Europe framework programme under grant agreement No. 101058728. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the European Research Executive Agency can be held responsible for them.

sections/0_causality/open_data_citation_advantage.qmd

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,13 @@ affiliations:
1111
department: Centre for Science and Technology Studies
1212
city: Leiden
1313
country: the Netherlands
14+
15+
title: The effect of Open Data on Citations
1416
---
1517

16-
# The effect of Open Data on Citations {#open-data-citation-advantage .unnumbered}
1718

1819
::: {.callout collapse="true"}
19-
## History
20+
# History
2021

2122
| Version | Revision date | Revision | Author |
2223
|---------|---------------|-------------|------------|

sections/0_causality/open_data_cost_savings.qmd

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,13 @@ affiliations:
1313
name: Centre for Industrial Studies
1414
city: Milan
1515
country: Italy
16+
17+
title: The effect of Open Data on cost savings
1618
---
1719

18-
# The effect of Open Data on cost savings {#the-effect-of-open-data-on-cost-savings .unnumbered}
1920

2021
::: {.callout collapse="true"}
21-
## History
22+
# History
2223

2324
| | | | |
2425
|------------------|------------------|------------------|------------------|
@@ -30,7 +31,7 @@ affiliations:
3031
| 1.0 | 2024-10-07 | Template outline | E. Delugas |
3132
:::
3233

33-
## Literature background
34+
# Literature background
3435

3536
Measuring the economic impact of open science and open data has proven to be challenging. Many theoretical studies highlight the benefits of making research results public, with strong support for Open Science from economic research on technological change [@chataway2018,@yozwiak_data_2015,@mazzucato_entrepreneurial_2011]. However, only few studies have attempted to measure the impacts of open science compared to closed science, and more robust evidence on how Open Science drives innovation and economic outcomes is needed to strengthen support and counter emerging criticisms [@karasz2024,@ali-khan2018]. The existing literature mainly concentrates on specific sectors, particularly health, medicine, and biosciences, which receive more attention due to early regulation by funders and significant interest in clinical trial outcomes. Another important stream of literature is focused on highlighting the economic value of Open Science through personal industry experiences, though lacking precise quantitative evidence, with contributions from @mcmanamay_openaccess_2014 on fisheries, @harding2017 on medicine, @chan_cost_2015 on the transition to an Open Science model, and @chen2017 on the role of open data in AI and machine learning applications. Although directly linking economic outcomes to open data initiatives can be challenging, with authors combining theoretical arguments and the limited quantitative evidence available at the time of their publication [@ali2022,@tennant2016,@fell2019,@wehn2021,@arzberger_promoting_2004], open access to findings and data is considered to lead to significant savings in access costs. By removing paywalls and subscription fees, open data allows researchers and businesses to access valuable information without incurring additional costs. A major economic benefit of lowering the cost of knowledge is the availability of an extra budget that can be reallocated for other purposes [@tennant2016].
3637

@@ -44,13 +45,13 @@ To fully harness the potential of open data, it is important to develop the nece
4445

4546
Despite the theoretical benefits of open data, several limitations hinder a comprehensive assessment of its economic impact fully. Implementing open data practices requires significant investments in infrastructure, technology, and training, potentially offsetting some cost savings [@vanvlijmen2020]. Moreover, the European Commission study emphasises that the benefits of open data are contingent on the quality and standardisation of the data provided [@EC-DGRI2018]. Finally, a major limitation is the scarcity of empirical evidence; few studies have attempted to measure the impacts of open science compared to closed science, making it challenging to generalise findings [@karasz2024]. @herala2016 review the benefits and challenges of open data initiatives in the private sector, highlighting advantages like enhanced collaboration and innovation, but caution that these are often based on speculative assumptions rather than empirical evidence, emphasising the need for further research to inform best practices and mitigate risks associated with increased costs and data privacy concerns.
4647

47-
## Directed Acyclic Graph (DAG)
48+
# Directed Acyclic Graph (DAG)
4849

4950
As discussed in the general [introduction on causal inference](causal_intro/article/intro-causality.qmd), we use DAGs to represent structural causal models. In the following, a DAG ([@fig-model]) is employed to examine the causal relationship between *Open Data* and *Cost Savings*. The visual illustrates multiple potential pathways, including a direct path from Open Data and Cost Savings, an indirect one involving Time Savings (i.e., a mediator), and additional paths that incorporate factors affecting either Open Data or Time Savings (i.e., confounders). These additional factors, such as technological infrastructure, data quality and availability, standardisation, user skills, innovation, and collaboration introduce layers of complexity to the model. As we will show in the subsequent sections, they are essential to discuss the causal and non-causal, open and closed, relationships among all these variables.
5051

5152
![Hypothetical structural causal model on Open Data](figures/DAG_open_data_cost_savings-0.png){#fig-model}
5253

53-
## The effect of Open data on Cost Saving
54+
# The effect of Open data on Cost Saving
5455

5556
In this section, we apply the concepts presented in the section Causality in Science Studies to potential research questions. We present a specific perspective on causal inference through the lens of structural causal models [@pearl_causality_2009].
5657

@@ -88,7 +89,7 @@ In the [causality introduction](causal_intro/article/intro-causality.qmd) we emp
8889

8990
This approach is effective in predicting how probability distributions shift under controlled changes when the causal structure is known. However, this approach depends on having an established causal graph, limiting its use for exploring causality from scratch. Critics suggest an alternative approach that allows causal discovery through experimentation without prior assumptions about mechanisms [@woodward2003]. This is especially useful in complex fields, such as social and biomedical sciences, where causal relationships are less understood. Considering this, @cunningham_causal_2021 argues that sample selection problems have been recognised long before the introduction of DAGs, with early solutions like @heckman1979, and emphasizes that an atheoretical approach to empiricism is inadequate. He asserts that causal inference requires a deep understanding of the behavioural processes behind the phenomenon being studied, and while DAGs are useful, they cannot replace the need for theoretical knowledge in creating credible identification strategies. Thus, causal inference is not solved by simply collecting more data, but by integrating theory with empirical analysis.
9091

91-
## Discussing empirical issues
92+
# Discussing empirical issues
9293

9394
The presented model illustrates how open data might drive cost savings by focusing on a limited set of variables commonly discussed in the literature. However, this approach presents challenges, as the existing literature reveals a significant gap in empirical studies that specifically measure the economic impact of open data on cost savings.
9495

sections/0_causality/social_causality.qmd

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,15 @@ affiliations:
1616
name: University of Geneva
1717
city: Geneva
1818
country: Switzerland
19+
20+
title: Causality Research on the Social Impact of Open Science
1921
---
2022

21-
# Causality Research on the Social Impact of Open Science {#causality-societal-impact .unnumbered}
2223

2324
::: {.callout collapse="true"}
2425

2526

26-
## History
27+
# History
2728

2829
| Version | Revision date | Revision | Author |
2930
|---------|---------------|-------------|--------------|

sections/1_open_science/APC_costs.qmd

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,15 @@ affiliations:
2222
name: Athena Research Center
2323
city: Athena
2424
country: Greece
25+
26+
title: APC Costs
2527
---
2628

27-
# APC Costs {#apc-costs .unnumbered}
2829

2930
::: {.callout collapse="true"}
3031

3132

32-
## History
33+
# History
3334

3435
| Version | Revision date | Revision | Author |
3536
|-------------|-------------|---------------|-------------------------------|
@@ -38,17 +39,17 @@ affiliations:
3839

3940
:::
4041

41-
## Description
42+
# Description
4243

4344
Αrticle Processing Charges (APCs) represent the price that publishers demand from authors to pay in order to publish their articles and books under an open access license. They capture the affordability and accessibility of Open Access publishing for different types of stakeholders, such as researchers, institutions, and funding agencies. It is also relevant for policy-makers seeking to optimize Open Science policies.
4445

4546
Tracking and comparing APCs could also be used to encourage publishers to adopt more transparent and equitable pricing policies and support the development of sustainable Open Access publishing models accessible to all researchers regardless of their financial resources.
4647

4748
APCs have both benefits and drawbacks. In a strict sense, they do not remove the economic barriers between the writing and the reading of scientific results but shift these costs from the readers to the authors. In countries where funder reimbursement of APC costs or transformative agreements do not cover these costs, APCs can create a financial barrier that limits access to Open Access journals, often generating asymmetries between richer and poorer countries and academic institutions. On the other hand, APCs also incentivize publishers to offer Open Access publishing, which promotes Open Science.
4849

49-
## Metrics
50+
# Metrics
5051

51-
### Number/Share (%) of publications with an APC cost
52+
## Number/Share (%) of publications with an APC cost
5253

5354
These metrics measure the number or share (in percentage - %) of publications in journals and have incurred an APC. The share provides a more nuanced understanding of the affordability and accessibility of Open Access publishing than the absolute number and it can be used to compare the affordability and accessibility of Open Access publishing across different journals, publishers, and regions in a more meaningful way.
5455

@@ -57,11 +58,11 @@ Limitation:
5758
- Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this indicator.
5859
- The share of publications with an APC could be more useful when put together with % Diamond OA publications, as opposed to stand alone.
5960

60-
#### Measurement
61+
### Measurement
6162

62-
##### Methodology
63+
#### Methodology
6364

64-
###### OpenAIRE Graph
65+
##### OpenAIRE Graph
6566

6667
The [OpenAIRE Graph](https://graph.openaire.eu) dump currently does not include OA color classifications, though they are already implemented in the OpenAIRE MONITOR and are expected to be integrated into the graph dump in Q1 2024.
6768

@@ -74,7 +75,7 @@ Limitations:
7475

7576
- This methodology is a workaround chosen as it provides better coverage than any APC dataset we are aware of, however it is not a direct source of whether a publication has incurred an APC or not.
7677

77-
### Average APC
78+
## Average APC
7879

7980
Description:
8081

@@ -89,11 +90,11 @@ Limitations:
8990
- The cost of APCs can vary widely depending on the field of research, the region, and the specific publisher or journal, therefore taking an average may be misleading.
9091
- Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this metric.
9192

92-
#### Measurement
93+
### Measurement
9394

94-
##### Datasources
95+
#### Datasources
9596

96-
###### OpenAIRE Graph
97+
##### OpenAIRE Graph
9798

9899
The OpenAPC APC dataset, which is integrated in the [OpenAIRE Graph](https://graph.openaire.eu).
99100

@@ -102,7 +103,7 @@ Limitations:
102103
- Incomplete data: Publishers do not generally provide data on their APC fees, OpenAPC (openapc.net) has a growing collection but it is not complete.
103104
- In an (organization, publication, APC cost) triplet of OpenAPC, to the best of our knowledge, it is not possible to distinguish if the APC cost is the entire cost of the publication or just the what the organization paid.
104105

105-
##### Methodology
106+
#### Methodology
106107

107108
Via OpenAIRE MONITOR (monitor.openaire.eu)
108109

@@ -115,7 +116,7 @@ Limitations:
115116

116117
- Averages have the benefit of summarizing and normalizing information, however depending on the underlying distribution of costs, they may be misleading (e.g. via outliers)
117118

118-
### Total APC
119+
## Total APC
119120

120121
Description:
121122

@@ -130,11 +131,11 @@ Limitations:
130131
- Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this metric, more than the previous ones.
131132
- It does not contain information of the distribution of APCs across a subdomain, e.g. Total cost does not give info on how it is distributed across scientific domains.
132133

133-
#### Measurement
134+
### Measurement
134135

135-
##### Datasources
136+
#### Datasources
136137

137-
###### OpenAIRE Graph
138+
##### OpenAIRE Graph
138139

139140
The OpenAPC APC dataset, which is integrated in the [OpenAIRE Graph](https://graph.openaire.eu).
140141

@@ -143,7 +144,7 @@ Limitations:
143144
- Incomplete data: Publishers do not generally provide data on their APC fees, OpenAPC (openapc.net) has a growing collection but it is not complete.
144145
- In an (organization, publication, APC cost) triplet of OpenAPC, to the best of our knowledge, it is not possible to distinguish if the APC cost is the entire cost of the publication or just the what the organization paid.
145146

146-
##### Methodology
147+
#### Methodology
147148

148149
Via OpenAIRE MONITOR
149150

@@ -156,7 +157,7 @@ Limitations:
156157

157158
- Totals have the benefit of giving a bird's eye view, however depending on the underlying distribution of costs, they can have different implications.
158159

159-
## Known correlates
160+
# Known correlates
160161

161162
Via: <https://direct.mit.edu/qss/article/1/1/6/15582/Article-processing-charges-Mirroring-the-citation>
162163

@@ -165,7 +166,7 @@ Via: <https://direct.mit.edu/qss/article/1/1/6/15582/Article-processing-charges-
165166
- Hybrid vs. Gold OA
166167
- SNIP
167168

168-
### Estimating unknown APCs
169+
## Estimating unknown APCs
169170

170171
An APC extrapolation exercise was conducted for the purpose of an EC study [@monitori2021]. The authors defined groupings, and imputed the average APC of the group to each publication in the group for which the APC is unknown. The groupings were based on the following variables, similar to the correlates above: - quantile of the Source Normalised Impact per Paper (SNIP) score in the [CWTS journal indicators](https://www.journalindicators.com/), - whether the publication is pure ‘gold’ open access or ‘hybrid’, - the year of publication.
171172

0 commit comments

Comments
 (0)