You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: index.qmd
+18-3Lines changed: 18 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,23 @@
1
-
# Open Science Impact Indicator Handbook {.unnumbered}
1
+
---
2
+
title: Open Science Impact Indicator Handbook
3
+
4
+
author:
5
+
- name: V.A Traag
6
+
orcid: 0000-0003-3170-3879
7
+
affiliations:
8
+
- ref: cwts
9
+
10
+
affiliations:
11
+
- id: cwts
12
+
name: Leiden University
13
+
department: Centre for Science and Technology Studies
14
+
city: Leiden
15
+
country: the Netherlands
16
+
---
2
17
3
18
This is the Open Science Impact Indicator Handbook by PathOS. In this handbook we cover various indicators measuring various aspects around [Open Science](sections/1_open_science/introduction_open_science.qmd) itself, their [academic](sections/2_academic_impact/introduction_academic_impact.qmd), [societal](sections/3_societal_impact/introduction_societal_impact.qmd) and [economic impacts](sections/4_economic_impact/introduction_economic_impact.qmd), and [reproducibility](sections/5_reproducibility/introduction_reproducibility.qmd).
4
19
5
-
##Executive summary
20
+
# Executive summary
6
21
7
22
In the PathOS project we take a causal perspective on studying Open Science. This necessitates making a distinction between impact itself, and the effect of Open Science on impact. For instance, we could very well see an Open Source research tool being used frequently by industry. In that sense, the Open Source research tool can be said to have a type of economic impact. However, it could very well be that the research tool would have been similarly used by industry had it been released as closed software under a commercial licence. We are interested in the difference between its actual impact under the Open Science principles and its counterfactual impact under a closed principle. That is, we are interested in the *causal* effect of Open Science on the impact of the science.
8
23
@@ -14,6 +29,6 @@ Finally, not all indicators are equally well-developed. Some indicators, like [c
14
29
15
30
We hope this handbook will be a central hub to keep track of Open Science related indicators. We hope to be able to contribute to keeping the impact indicator handbook up-to-date as part of the PathOS project. The handbook is open to community contributions. Together we may create a central resource that is useful to all.
16
31
17
-
##Acknowledgements {.unnumbered}
32
+
# Acknowledgements {.unnumbered}
18
33
19
34
The PathOS project has received funding from the European Union’s Horizon Europe framework programme under grant agreement No. 101058728. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the European Research Executive Agency can be held responsible for them.
Measuring the economic impact of open science and open data has proven to be challenging. Many theoretical studies highlight the benefits of making research results public, with strong support for Open Science from economic research on technological change [@chataway2018,@yozwiak_data_2015,@mazzucato_entrepreneurial_2011]. However, only few studies have attempted to measure the impacts of open science compared to closed science, and more robust evidence on how Open Science drives innovation and economic outcomes is needed to strengthen support and counter emerging criticisms [@karasz2024,@ali-khan2018]. The existing literature mainly concentrates on specific sectors, particularly health, medicine, and biosciences, which receive more attention due to early regulation by funders and significant interest in clinical trial outcomes. Another important stream of literature is focused on highlighting the economic value of Open Science through personal industry experiences, though lacking precise quantitative evidence, with contributions from @mcmanamay_openaccess_2014 on fisheries, @harding2017 on medicine, @chan_cost_2015 on the transition to an Open Science model, and @chen2017 on the role of open data in AI and machine learning applications. Although directly linking economic outcomes to open data initiatives can be challenging, with authors combining theoretical arguments and the limited quantitative evidence available at the time of their publication [@ali2022,@tennant2016,@fell2019,@wehn2021,@arzberger_promoting_2004], open access to findings and data is considered to lead to significant savings in access costs. By removing paywalls and subscription fees, open data allows researchers and businesses to access valuable information without incurring additional costs. A major economic benefit of lowering the cost of knowledge is the availability of an extra budget that can be reallocated for other purposes [@tennant2016].
36
37
@@ -44,13 +45,13 @@ To fully harness the potential of open data, it is important to develop the nece
44
45
45
46
Despite the theoretical benefits of open data, several limitations hinder a comprehensive assessment of its economic impact fully. Implementing open data practices requires significant investments in infrastructure, technology, and training, potentially offsetting some cost savings [@vanvlijmen2020]. Moreover, the European Commission study emphasises that the benefits of open data are contingent on the quality and standardisation of the data provided [@EC-DGRI2018]. Finally, a major limitation is the scarcity of empirical evidence; few studies have attempted to measure the impacts of open science compared to closed science, making it challenging to generalise findings [@karasz2024]. @herala2016 review the benefits and challenges of open data initiatives in the private sector, highlighting advantages like enhanced collaboration and innovation, but caution that these are often based on speculative assumptions rather than empirical evidence, emphasising the need for further research to inform best practices and mitigate risks associated with increased costs and data privacy concerns.
46
47
47
-
##Directed Acyclic Graph (DAG)
48
+
# Directed Acyclic Graph (DAG)
48
49
49
50
As discussed in the general [introduction on causal inference](causal_intro/article/intro-causality.qmd), we use DAGs to represent structural causal models. In the following, a DAG ([@fig-model]) is employed to examine the causal relationship between *Open Data* and *Cost Savings*. The visual illustrates multiple potential pathways, including a direct path from Open Data and Cost Savings, an indirect one involving Time Savings (i.e., a mediator), and additional paths that incorporate factors affecting either Open Data or Time Savings (i.e., confounders). These additional factors, such as technological infrastructure, data quality and availability, standardisation, user skills, innovation, and collaboration introduce layers of complexity to the model. As we will show in the subsequent sections, they are essential to discuss the causal and non-causal, open and closed, relationships among all these variables.
50
51
51
52
{#fig-model}
52
53
53
-
##The effect of Open data on Cost Saving
54
+
# The effect of Open data on Cost Saving
54
55
55
56
In this section, we apply the concepts presented in the section Causality in Science Studies to potential research questions. We present a specific perspective on causal inference through the lens of structural causal models [@pearl_causality_2009].
56
57
@@ -88,7 +89,7 @@ In the [causality introduction](causal_intro/article/intro-causality.qmd) we emp
88
89
89
90
This approach is effective in predicting how probability distributions shift under controlled changes when the causal structure is known. However, this approach depends on having an established causal graph, limiting its use for exploring causality from scratch. Critics suggest an alternative approach that allows causal discovery through experimentation without prior assumptions about mechanisms [@woodward2003]. This is especially useful in complex fields, such as social and biomedical sciences, where causal relationships are less understood. Considering this, @cunningham_causal_2021 argues that sample selection problems have been recognised long before the introduction of DAGs, with early solutions like @heckman1979, and emphasizes that an atheoretical approach to empiricism is inadequate. He asserts that causal inference requires a deep understanding of the behavioural processes behind the phenomenon being studied, and while DAGs are useful, they cannot replace the need for theoretical knowledge in creating credible identification strategies. Thus, causal inference is not solved by simply collecting more data, but by integrating theory with empirical analysis.
90
91
91
-
##Discussing empirical issues
92
+
# Discussing empirical issues
92
93
93
94
The presented model illustrates how open data might drive cost savings by focusing on a limited set of variables commonly discussed in the literature. However, this approach presents challenges, as the existing literature reveals a significant gap in empirical studies that specifically measure the economic impact of open data on cost savings.
Αrticle Processing Charges (APCs) represent the price that publishers demand from authors to pay in order to publish their articles and books under an open access license. They capture the affordability and accessibility of Open Access publishing for different types of stakeholders, such as researchers, institutions, and funding agencies. It is also relevant for policy-makers seeking to optimize Open Science policies.
44
45
45
46
Tracking and comparing APCs could also be used to encourage publishers to adopt more transparent and equitable pricing policies and support the development of sustainable Open Access publishing models accessible to all researchers regardless of their financial resources.
46
47
47
48
APCs have both benefits and drawbacks. In a strict sense, they do not remove the economic barriers between the writing and the reading of scientific results but shift these costs from the readers to the authors. In countries where funder reimbursement of APC costs or transformative agreements do not cover these costs, APCs can create a financial barrier that limits access to Open Access journals, often generating asymmetries between richer and poorer countries and academic institutions. On the other hand, APCs also incentivize publishers to offer Open Access publishing, which promotes Open Science.
48
49
49
-
##Metrics
50
+
# Metrics
50
51
51
-
###Number/Share (%) of publications with an APC cost
52
+
## Number/Share (%) of publications with an APC cost
52
53
53
54
These metrics measure the number or share (in percentage - %) of publications in journals and have incurred an APC. The share provides a more nuanced understanding of the affordability and accessibility of Open Access publishing than the absolute number and it can be used to compare the affordability and accessibility of Open Access publishing across different journals, publishers, and regions in a more meaningful way.
54
55
@@ -57,11 +58,11 @@ Limitation:
57
58
- Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this indicator.
58
59
- The share of publications with an APC could be more useful when put together with % Diamond OA publications, as opposed to stand alone.
59
60
60
-
####Measurement
61
+
### Measurement
61
62
62
-
#####Methodology
63
+
#### Methodology
63
64
64
-
######OpenAIRE Graph
65
+
##### OpenAIRE Graph
65
66
66
67
The [OpenAIRE Graph](https://graph.openaire.eu) dump currently does not include OA color classifications, though they are already implemented in the OpenAIRE MONITOR and are expected to be integrated into the graph dump in Q1 2024.
67
68
@@ -74,7 +75,7 @@ Limitations:
74
75
75
76
- This methodology is a workaround chosen as it provides better coverage than any APC dataset we are aware of, however it is not a direct source of whether a publication has incurred an APC or not.
76
77
77
-
###Average APC
78
+
## Average APC
78
79
79
80
Description:
80
81
@@ -89,11 +90,11 @@ Limitations:
89
90
- The cost of APCs can vary widely depending on the field of research, the region, and the specific publisher or journal, therefore taking an average may be misleading.
90
91
- Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this metric.
91
92
92
-
####Measurement
93
+
### Measurement
93
94
94
-
#####Datasources
95
+
#### Datasources
95
96
96
-
######OpenAIRE Graph
97
+
##### OpenAIRE Graph
97
98
98
99
The OpenAPC APC dataset, which is integrated in the [OpenAIRE Graph](https://graph.openaire.eu).
99
100
@@ -102,7 +103,7 @@ Limitations:
102
103
- Incomplete data: Publishers do not generally provide data on their APC fees, OpenAPC (openapc.net) has a growing collection but it is not complete.
103
104
- In an (organization, publication, APC cost) triplet of OpenAPC, to the best of our knowledge, it is not possible to distinguish if the APC cost is the entire cost of the publication or just the what the organization paid.
104
105
105
-
#####Methodology
106
+
#### Methodology
106
107
107
108
Via OpenAIRE MONITOR (monitor.openaire.eu)
108
109
@@ -115,7 +116,7 @@ Limitations:
115
116
116
117
- Averages have the benefit of summarizing and normalizing information, however depending on the underlying distribution of costs, they may be misleading (e.g. via outliers)
117
118
118
-
###Total APC
119
+
## Total APC
119
120
120
121
Description:
121
122
@@ -130,11 +131,11 @@ Limitations:
130
131
- Not knowing who incurred that APC (funder, institution, author, etc.) limits the usefulness of this metric, more than the previous ones.
131
132
- It does not contain information of the distribution of APCs across a subdomain, e.g. Total cost does not give info on how it is distributed across scientific domains.
132
133
133
-
####Measurement
134
+
### Measurement
134
135
135
-
#####Datasources
136
+
#### Datasources
136
137
137
-
######OpenAIRE Graph
138
+
##### OpenAIRE Graph
138
139
139
140
The OpenAPC APC dataset, which is integrated in the [OpenAIRE Graph](https://graph.openaire.eu).
140
141
@@ -143,7 +144,7 @@ Limitations:
143
144
- Incomplete data: Publishers do not generally provide data on their APC fees, OpenAPC (openapc.net) has a growing collection but it is not complete.
144
145
- In an (organization, publication, APC cost) triplet of OpenAPC, to the best of our knowledge, it is not possible to distinguish if the APC cost is the entire cost of the publication or just the what the organization paid.
145
146
146
-
#####Methodology
147
+
#### Methodology
147
148
148
149
Via OpenAIRE MONITOR
149
150
@@ -156,7 +157,7 @@ Limitations:
156
157
157
158
- Totals have the benefit of giving a bird's eye view, however depending on the underlying distribution of costs, they can have different implications.
An APC extrapolation exercise was conducted for the purpose of an EC study [@monitori2021]. The authors defined groupings, and imputed the average APC of the group to each publication in the group for which the APC is unknown. The groupings were based on the following variables, similar to the correlates above: - quantile of the Source Normalised Impact per Paper (SNIP) score in the [CWTS journal indicators](https://www.journalindicators.com/), - whether the publication is pure ‘gold’ open access or ‘hybrid’, - the year of publication.
0 commit comments