-
Notifications
You must be signed in to change notification settings - Fork 3
/
bestpractice.Rmd
369 lines (240 loc) · 25.8 KB
/
bestpractice.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
---
title: "Best practice examples for primary data publication"
fontsize: 11pt
csl: amnat.csl
bibliography: biblio.bib
nocite: |
@gossner15a, @gossner15
output:
html_document:
fig_caption: yes
keep_md: no
---
<!---->
This tutorial provides best-practice principles when standardising traits with terms of the Ecological Trait-data Standard (ETS, https://terminologies.gfbio.org/terms/ets/pages/).
We focus this document on the first of the three use-cases for the ETS vocabulary: The publication of project-specific primary data.
To comply with the FAIR principles of scientific data management and aim at findability, accessibility, interoperability and reusability [@wilkinson16], trait data should apply the following principles when preparing data for publication:
- **Label your dataset!** data should be published with appropriate metadata and provide a descriptive title and keywords to be findable by search engines. Keywords should include the general label "trait-data", and more specific labels such as "functional traits" or "morphometric traits", as well as the taxa covered by the dataset.
- **State terms of use!** To enable re-use, the metadata must contain the name of the author or owner of the data, the suggested citation for the data, as well as the license model or terms of use of the data. Recommended are copyright wavers and licenses that permit re-use without seeking authorization from the rights holder, such as Creative Commons (CC) licenses, which also can warrant attribution and citation (CC by) [@kissling18]. Ideally, these statements are provided by the data-hoster in a machine-readable way, by using standardised fields of metadata or hyperlink references.
- **Use independent file formats!** For reasons of long-term accessibility, data should not be uploaded in proprietary spreadsheet formats but rather in software-independent file formats (like .xlsx or .odt), or comma- or tab-separated text files ('.csv' or '.tsv') that are compatible with all computing platforms and internationalisation settings by applying a unified character encoding (e.g., UTF-8 or ASCII). For internationalisation, use points as decimal delimiters and set date-formats to ISO 8601, i.e., "YYYY-MM-DD".
- **Use standard terms!** Draw column or field names from the ETS vocabulary wherever possible. If co-variates or any ancillary data are not covered by the vocabulary, apply terms of the Darwin Core Standard (DwC).
- **Use URI references!** Wherever possible, the concepts used to describe the dataset should be referenced to globally accessible Uniform Resource Identifiers (URIs). These references allow data users and software tools to relate the dataset to well-defined concepts of taxa or traits, or measurement units, etc. The ETS-terms 'traitID' and 'taxonID' are defined to label fields with URIs as values.
- **Provide definitions!** Metadata must reference the data standards, taxonomic concepts and trait ontologies applied in the dataset as well as any user-defined terms and trait concepts. To provide own trait-concepts, use the terms provided within the ETS section "Terms for traitlists".
- **Publish raw data!** When publishing trait-data, priority is on containing measurement values while minimizing the risk of purporting errors through data conversion or aggregation. Thus, data should be published as they have been recorded, i.e., in original units and raw values, along with standardised values that match a unified and compatible scheme of a data standard. This is true for the core terms, i.e., the trait value and the item of observation and its taxon assignment, but also for any co-variates reported along with it.
- **Keep original values!** For the core terms, data providers should provide standardised values conforming to consensus terms of the own research field. The ETS provides terms for maintaining user-defined entries (such as project specific taxon abbreviations) along with the standard terms (using the
'verbatim\*' duplicates of the terms).
For co-variates, the extensions of the vocabulary provide column names for standardised values only, e.g., elevation above sea-level. These should adhere to accepted standards wherever possible. If any conversion was applied to obtain these data, users should keep the original values in user-defined columns, e.g., by appending 'verbatim\*' to the column name. Also, additional co-variates, e.g., on environmental growth conditions of the specimen, should be provided in user-defined columns. Any user-defined columns must be adequately defined in the accompanying meta-information.
- **Report Uncertainty!** If publishing aggregate data (e.g., averages for multiple measurements), report the $n$ and dispersion of the aggregate value. This allows to weigh data from multiple sources when producing compilations. Any sources of bias or noise should be reported along with the data (e.g., observer ID, conditions of measurement etc.). Use terms of the ETS extension "Measurement or Fact".
- **Use Identifiers!** Well-defined identifiers ('IDs') are key elements to structure the datasets and relate them to complementing datasets, if necessary (Fig. 2 *c* & *d*).
For instance, for occurrence level data where multiple trait measurements are reported for each individual specimen, the same user-defined entry for 'occurrenceID' would link several measurements across the rows of the dataset.
Similarly, multivariate measurements, for instance gas chromatography data or x-y-z data of morphometric coordinates could be linked via a 'measurementID'.
In literature data, summarised traits are usually given at the taxon level instead of the individual organism (e.g., reported as means or factorials) and a 'taxonID' is the key identifier.
In larger compilations, a 'datasetID' allow to trace data origin to the primary source and keep track of authorship and original reference.
Beyond being just of structural use for the dataset, identifiers are capable of linking own data to consensus taxonomy and trait terminology via URIs, which point to external terminology services.
<!--
When applying the vocabulary in a primary data publication, most authors will chose a two-dimensional spreadsheet rather than relational databases for reasons of simplicity.
For spreadsheets, it is recommended to use a observation long-table format for the data (Fig. 2 *b*), rather than a species$\times$traits matrix (Fig. 2 *a*).
As the long-table format draws from a defined set of columns, merging datasets is easier. Long-table datasets also purport multiple advantages for data manipulation [e.g., filtering, sub-setting and aggregating data, @wickham14].
-->
\newpage
# Example 1 - Apply standard terms for matrices
If trait data have been collated at the species level from different literature sources or from expert knowledge, or as aggregated measurements collated from raw data, they usually are reported in a observation-level wide-table format, with a column of values for each trait recorded and a row for each species (or taxon) for which data were available.
The wide-table format is widely used for the production of lookup-tables at the species level, which may either report qualitative facts (from literature or expert knowledge) or aggregated quantitative traits (averages or ranges of trait values for this taxon).
This wide-table format is straightforward for maintaining and managing new data for the data author.
In terms of analysis, it allows to easily correlate traits with each other and analyse trade-offs between traits.
However, wide-tables need to clearly differentiate missing data and traits not applicable to a given taxon (often filled in as 'NA' or dummy numbers, e.g., '999'). Additional information (e.g., on variation of means or literature source) may be stored in secondary columns or accompanying datasheets.
When applying the data standard to wide-table data, the columns for taxa or individuals may be labelled using the terms of the ETS.
Taxon concepts should be referenced to globally available concepts using URIs, e.g., by referring to GBIF Taxonomy. Taxa authors and fine-grained higher taxonomy must then be consistent with the linked concept, or may be omitted as redundant and to avoid confusion.
Remarks should be provided in a human-readable form within the table and provided with further detail in the metadata, if necessary.
For wide tables, it is essential that the multiple trait columns are labelled in unambiguous ways and be linked to trait concepts provided in the metadata or a secondary spreadsheet.
\newpage
```{r, echo = FALSE, warning = FALSE, message=FALSE}
options(knitr.table.format = "latex")
library(kableExtra)
library(traitdataform)
pulldata('arthropodtraits')
caption = "Example 1 before standardisation; excerpt of original data from Gossner et al. (2015) \\textit{Scientific Data} 2:150013; data DOI: \\href{https://doi.org/10.5061/dryad.53ds2}{10.5061/dryad.53ds2} containing eight traits of arthropods from an European grassland; details provided in data publication \\href{https://www.nature.com/articles/sdata201513/tables/2}{Table 2}."
kable(arthropodtraits[1:45,c(1:8,17)], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
\newpage
```{r, echo = FALSE, warning = FALSE, message=FALSE}
arthropodtraits_head <- head(arthropodtraits, 45)
names(arthropodtraits_head)[4] <- "scientificName"
names(arthropodtraits_head)[1:3] <- tolower(names(arthropodtraits_head)[1:3] )
arthropodtraits_head$taxonID <- get_gbif_taxonomy(levels(droplevels(arthropodtraits_head$scientificName)))$taxonID
names(arthropodtraits_head)[17] <- "measurementRemarks"
levels(arthropodtraits_head$measurementRemarks) <- c("","non-grassland species")
select_cols <- c(4,1,3,18,6:8,17)
caption = "Example 1 after standardisation. Column names mapped to ETS terms; taxa are mapped to taxon concepts in published terminologies via URIs. Trait columns are not standardised and respective definitions and units must be provided in the accompanying metadata."
kable(arthropodtraits_head[,select_cols], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
\newpage
# Example 2 - provide structure in long-table
In observation wide-tables, each row is centered around a single physical occurrence of a species, e.g., phenotypic variation of traits within species would usually be recorded per observation where each replicate taken on a distinct individual specimen would be recorded in a row, with the respective trait values stored in columns (see Table B1).
Those data are common in investigations of evolutionary trade-offs and trait correlation, or of intra-specific variation along environmental gradients. Occurrence data would also capture phenotypic variation arising from morphotypes and sexual dimorphism.
This format is the most intuitive for recording own empirical measurements and therefore is common for measured quantitative data but rarely found for reported qualitative facts obtained from the literature.
A more effective method that allows for high flexibility and easier data merging is the storage of traitdata in long-table formats [@wickham14; @parr16].
This data format corresponds has a row-based structure: each row contains a single measurement or fact of a specific trait, referenced to a single specimen (or one occurrence of an individual organism at a particular space and time) which is assigned to a specific taxon (see Table B2).
Identifiers can provide structure and link measurements that are correlated by assigning a unique ID for each single individual.
Also, multivariate trait measurements can be recorded in this format by linking multiple rows via a unique measurement identifier.
Long-table datasets purport multiple advantages for data handling (e.g., filtering, sub-setting and aggregating data), visualization (e.g., plot measured values by factor variable or taxon) and statistical modelling (e.g., ANOVA for testing difference of trait value by sex) [@wickham14].
As long-table data share a larger set of columns, merging of datasets is in most cases easier for the end-user.
However, a disadvantage of the long-table data structure is the mixing of units or even factor levels in the 'traitValue' column. The mixed content of this column prohibits filtering data based on their values. Data-splitting techniques or a database system are tools to disentangle the data.
Note also, how the datasets become bulkier, as many information and labels are now repeated on all measurements as instances of the same individual organism (i.e., one occurrence).
\newpage
```{r, echo = FALSE, warning = FALSE, message=FALSE}
library(XML)
library(RCurl)
library(getPass)
read.service <- function(datasetid,
user = NULL, pswd = NULL,
dec=".",
na.strings="NA",
fill=FALSE, sep="\t",
quote=if(identical(sep, "\n")) "" else "'\"",
fileEncoding = "UTF-8")
{
if(is.null(user)) user <- readline("user name: ")
if(is.null(pswd)) pswd <- getPass::getPass("password: ", noblank = FALSE)
opts = curlOptions(encoding="CE_UTF8" ,ssl.verifypeer = FALSE, httpheader=c(Accept = "text/plain"))
params = c("datasetId"=datasetid, "username"=user, "password"=pswd)
data <- postForm(uri="https://www.bexis.uni-jena.de/WebServices/DataService.asmx/DownloadData", .params=params, .opts=opts, style="post")
x <- xmlTreeParse(data, asText = T, trim=FALSE)
txt <- xmlValue(xmlRoot(x)[[1]])
txt <-gsub("#","?", txt)
f <- textConnection(toString(txt))
d <- utils::read.table(file=f, sep=sep, header=T, dec=dec, na.strings=na.strings, fill=fill, quote=quote, fileEncoding = fileEncoding)
close(f)
d
}
# 21228 life history traits of moths
# 21247 morphological traits of moths
#https://www.bexis.uni-jena.de/Data/ShowXml.aspx?DatasetId=21247
moths <- read.service(21247, dec = ",", user = "fdschneider", pswd = "Ro9G3s8w")
attr(moths, 'thesaurus') <- traitdataform:::as.thesaurus(
weight = traitdataform:::as.trait("body_weight",
expectedUnit = "mg", valueType = "numeric",
identifier = "http://t-sita.cesab.org/BETSI_vizInfo.jsp?trait=Body_weight"),
wing_length = traitdataform:::as.trait("wing_length",
expectedUnit = "mm", valueType = "numeric",
identifier = ""),
wing_width = traitdataform:::as.trait("wing_width",
expectedUnit = "mm", valueType = "numeric",
identifier = ""),
wing_area = traitdataform:::as.trait("wing_area",
expectedUnit = "mm2", valueType = "numeric",
identifier = "http://t-sita.cesab.org/BETSI_vizInfo.jsp?trait=Wing_surface"),
wing_loading = traitdataform:::as.trait("wing_loading",
expectedUnit = "mg mm-2", valueType = "numeric",
identifier = "")
)
attr(moths, 'taxa') <- "species"
attr(moths, 'keep') <- c(locationID = "plot")
attr(moths, 'occurrences') <- "label"
attr(moths, 'units') <- c(weight = "mg", wing_length = "mm", wing_width = "mm", wing_area = "mm2", wing_loading = "mg mm-2")
select_rows <- 1:12
caption <- "Example 2 before standardisation. Data are stored in observation wide-table format, i.e., one row contains a single specimen with multiple trait measurements (in columns). Units and specific information on measurements must be reported in the metadata of the dataset. Data extract from Mangels et al. (released within the data management system of the Biodiversity Exploratories, BExIS:21247)."
select_cols <- c("label","species","plot","weight","wing_length","wing_width","wing_area","wing_loading")
kable(rbind(moths[select_rows, select_cols]), longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption, digits = c(NA,NA,NA,3, 2, 2,4,4)) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
```{r, echo = FALSE, warning = FALSE, message=FALSE}
mothsStd <- standardize(moths)
mothsStd <- mothsStd[order(as.numeric(substr(mothsStd$occurrenceID,2,10))),]
names(mothsStd)[c(1,2,6,7)] <- c("verbatimScientificName", "verbatimTraitName", "scientificName", "traitName")
caption <- "Example 2 after standardisation into measurement long-table format. Each row contains a single measurement linked to an observation of an individual specimen, i.e., an occurrence, via identifiers. Only three traits have been selected from the original data, and rows have been cut off to fit the page."
select_cols <- c("scientificName","traitName","traitValue","traitUnit","traitID","verbatimTraitName","taxonID","occurrenceID", "measurementID", "locationID")
select_rows <- which(mothsStd$traitName %in% c("body_weight", "wing_area", "wing_length") )
kable(mothsStd[select_rows[1:28],select_cols], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption, digits = c(4)) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
# Example 3 - apply ETS extensions in trait-data table
If data contain covariate information, the two-dimensional data table can cover these by adding more columns.
The three extensions of the ETS complement the core terms according to different layers of metadata information: 1) the "taxon" extension includes terms to describe more details on the taxon concept applied, 2) the "Measurement or Fact" extension adds information on the methodology or context of the measurement of the trait value, and 3) the "occurrence" extension provides terms to describe the individual specimen and its observation context.
A data table for instance should report on the taxonomic resolution of the trait observation, especially if it is reported on something else than the species level (by setting `taxonRank`: genus). The additional terms for taxon hierarchies (`kingdom` to `genus`) would allow the data user to extract information on genera or families, but this would be redundant with an unambiguous taxon concept has being provided within `taxonID`.
Further columns would provide detail on the measurement accuracy (`measurementResolution`) or the number of individuals on which an aggregated measure has been determined (by reporting `individualCount`: *n* and `aggregateMeasure`: 'TRUE').
Data providers can easily contain information on sex or life stage of the specimen (using terms `sex` or `lifeStage`), or on the location of sampling (e.g., `verbatimLocality`, all 'occurrence' terms).
If data are provided in a single two-dimensional data table that applies standard terms of the ETS, this may increase readability and facilitate re-use by synthesis researchers as they do not need to process information stored elsewhere. Data tables conforming to the same data standard can be merged with ease.
```{r, echo = FALSE, warning = FALSE, message=FALSE}
pulldata('heteroptera_raw')
heteroptera_raw$Center_Sampling_region <- gsub("°", " ", heteroptera_raw$Center_Sampling_region)
select_cols <- c(1:7,9,11,12:15)
# https://figshare.com/collections/Morphometric_measures_of_Heteroptera_sampled_in_grasslands_across_three_regions_of_Germany/3307611/1
caption = "Example 3 before standardisation; Excerpt of data from Gossner et al. (2015b) Ecology, 96, 1154–1154. doi: \\href{https://doi.org/10.1890/14-2159.1}{10.1890/14-2159.1} on measurements of traits in Heteroptera."
select_rows <- which(heteroptera_raw$ID %in% 8:15)
kable(heteroptera_raw[select_rows,select_cols], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
\newpage
```{r, echo = FALSE, warning = FALSE, message=FALSE}
pulldata('heteroptera_raw')
heteroptera_long <- as.traitdata(heteroptera_raw)
heteroptera_long <- standardize.taxonomy(heteroptera_long)
heteroptera_long <- heteroptera_long[order(as.numeric(heteroptera_long$occurrenceID)),]
heteroptera_long$basisOfRecordDescription <- as.factor(heteroptera_long$basisOfRecordDescription)
heteroptera_long$decimalLatitude <- gsub("°", " ", substr(heteroptera_long$verbatimLocality,0,5))
heteroptera_long$decimalLongitute <- gsub("°", " ", substr(heteroptera_long$verbatimLocality,10,14))
heteroptera_long$decimalLatitude <- round(as.numeric(measurements::conv_unit(heteroptera_long$decimalLatitude, from = 'deg_dec_min', to = 'dec_deg')),5)
heteroptera_long$decimalLongitute <- round(as.numeric(measurements::conv_unit(heteroptera_long$decimalLongitute, from = 'deg_dec_min', to = 'dec_deg')), 5)
heteroptera_long$basisOfRecord <- as.factor("PreservedSpecimen")
caption = "Example 3 after standardisation; application of terms of the ETS extensions structure the two-dimensional data table. Excerpt of data from Gossner et al. (2015b) Ecology, 96, 1154–1154. doi: \\href{https://doi.org/10.1890/14-2159.1}{10.1890/14-2159.1}."
select_rows <- which(heteroptera_raw$ID %in% 8:15)
select_cols <- c(1,2,3,4,8,6,15,18,22,16,20,21)
kable(heteroptera_long[heteroptera_long$traitName %in% c("Body_length", "Body_width", "Body_height", "Thorax_width"),select_cols][30:58,], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
# Example 4 - apply ETS extensions in relational databse
Two-dimensional spreadsheets are however limited in the number and complexity of co-variates they can contain. As such, for datasets containing multi-layered information on observations, traits, taxa and environmental context, the use of relational database structures may be indicated.
Relational databases help to avoid content duplication, as the core table will only maintain identifiers, that relate to secondary data tables containing multiple specifications on the identifier level.
For instance, the core table may specify 'occurrenceID' for which specific information can be found in a lookup table.
The ETS may be applied to describe columns within the individual data tables of relational databases.
Compatibility with other database structures is warranted, as ETS terms map to terms of the Darwin Core Standard used in other suggested database structures for trait data, like the generic structure for plant trait databases [@kattge11a] or the TraitBank structure [@parr16].
For publishing relational databases, it is however important to provide sufficient information on how the individual data tables are related, e.g., in a machine readable metadata specification (e.g., in EML language).
Further, it must be ensured that data tables are always contained in a single data publication and are not distributed independently of each other, as this would cause issues of broken links if data are updated or lost.
Both requirements are met if data are released as a packaged archive, for instance following the Darwin Core Archive specification [@robertson09].
\newpage
```{r, echo = FALSE, warning = FALSE, message=FALSE}
pulldata('heteroptera_raw')
heteroptera_long <- as.traitdata(heteroptera_raw)
heteroptera_long <- standardize.taxonomy(heteroptera_long)
heteroptera_long <- heteroptera_long[order(as.numeric(heteroptera_long$occurrenceID)),]
heteroptera_long$basisOfRecordDescription <- as.factor(heteroptera_long$basisOfRecordDescription)
heteroptera_long$decimalLatitude <- gsub("°", " ", substr(heteroptera_long$verbatimLocality,0,5))
heteroptera_long$decimalLongitute <- gsub("°", " ", substr(heteroptera_long$verbatimLocality,10,14))
heteroptera_long$decimalLatitude <- round(as.numeric(measurements::conv_unit(heteroptera_long$decimalLatitude, from = 'deg_dec_min', to = 'dec_deg')),5)
heteroptera_long$decimalLongitute <- round(as.numeric(measurements::conv_unit(heteroptera_long$decimalLongitute, from = 'deg_dec_min', to = 'dec_deg')), 5)
heteroptera_long$basisOfRecord <- as.factor("PreservedSpecimen")
caption = "Example 4: original data of Example 3 after standardisation into relational database structure, providing taxon and occurrence level information in secondary data tables (Fig. 8 and 9, respectively)."
select_rows <- which(heteroptera_raw$ID %in% 8:15)
select_cols <- c(1,2,3,4,6,8)
kable(heteroptera_long[heteroptera_long$traitName %in% c("Body_length", "Body_width", "Body_height", "Thorax_width"),select_cols][30:60,], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
```{r, echo = FALSE, warning = FALSE, message=FALSE}
taxa <- attributes(heteroptera_long)$taxonomy
names(taxa)[c(1,3,4)]<- c("verbatimScientificName", "scientificName", "scientificNameAuthorship")
select_cols <- c(14,3,4,5,7,8,9,10,11,12,13)
select_rows <- taxa$taxonID %in% unique(heteroptera_long$taxonID[heteroptera_long$occurrenceID %in% 8:15])
caption = "Example 4, data table on taxon level information after standardisation. Data extracted from GBIF backbone taxonomy."
kable(taxa[select_rows,select_cols], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
```{r, echo = FALSE, warning = FALSE, message=FALSE}
caption = "Example 4, data table on occurrence level information after standardisation, accompanying Table 7."
select_rows <- c(155, 177, 199, 221, 243, 265, 287, 309)
select_cols <- c(8,18,22,16,17,20,21)
kable(heteroptera_long[select_rows,select_cols], longtable=TRUE, escape = TRUE, row.names = FALSE,
booktabs = T, caption = caption) %>%
kable_styling(latex_options = c("scale_down", "repeat_header", "striped"), font_size = 10)
```
# References