Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributors #317

Merged
merged 23 commits into from
Jul 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions 01_intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ library(rebook)
chapterPreamble()
```

This work - [**Orchestrating Microbiome Analysis with R and
Bioconductor**](https://microbiome.github.io/OMA/) [@OMA] - contributes novel
This work - [**Orchestrating Microbiome Analysis with Bioconductor**](https://microbiome.github.io/OMA/) [@OMA] - contributes novel
methods and educational resources for microbiome data science. It
aims to teach the grammar of Bioconductor workflows in the context of
microbiome data science. We show through concrete examples how to use
Expand Down
24 changes: 12 additions & 12 deletions 04_containers.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -33,16 +33,16 @@ sequencing.
information (such as phylogenetic trees and sample hierarchies) and
reference sequences.

[`MultiAssayExperiment`] (`MAE`) [@Ramos2017] provides an organized way to bind several different data
structures together in a single object. For example, we can bind
microbiome data (in `TreeSE` format) with metabolomic profiling data
(in `SE`) format, with shared sample metadata. This is convenient and
robust for instance in subsetting and other data manipulation
tasks. Microbiome data can be part of multiomics experiments and
analysis strategies and we want to outline the understanding in which
we think the packages explained and used in this book relate to these
experiment layouts using the `TreeSummarizedExperiment` and classes
beyond.
[`MultiAssayExperiment`] (`MAE`) [@Ramos2017] provides an organized
way to bind several different data containers together in a single
object. For example, we can bind microbiome data (in `TreeSE`
container) with metabolomic profiling data (in `SE`) container, with
(partially) shared sample metadata. This is convenient and robust for
instance in subsetting and other data manipulation tasks. Microbiome
data can be part of multiomics experiments and analysis strategies. We
highlight how the methods used througout in this book relate to this
data framework by using the `TreeSummarizedExperiment`,
`MultiAssayExperiment`, and classes beyond.

This section provides an introductions to these data containers. In
microbiome data science, these containers link taxonomic abundance
Expand Down Expand Up @@ -282,7 +282,7 @@ GlobalPatterns
[HintikkaXOData](https://microbiome.github.io/microbiomeDataSets/reference/HintikkaXOData.html)
is derived from a study about the effects of fat diet and prebiotics on the
microbiome of rat models [@Hintikka2021]. It is available in the MAE data
container for R. The dataset is briefly presented in
container for R. The dataset is briefly summarized in
[these slides](https://microbiome.github.io/outreach/hintikkaxo_presentation.html).


Expand Down Expand Up @@ -654,7 +654,7 @@ abundance table is named as "counts". Let us inspect only the first
cols and rows.

```{r}
assays(se)$counts[1:3, 1:3]
assay(se, "counts")[1:3, 1:3]
```

The `rowdata` includes taxonomic information from the biom file. The `head()` command
Expand Down
50 changes: 23 additions & 27 deletions 06_packages.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,11 @@ chapterPreamble()
The Bioconductor microbiome data science framework consists of:

- **data containers**, designed to organize multi-assay microbiome data
- **R packages** that provide dedicated methods for analysing such data
- **R/Bioconductor packages** that provide dedicated methods
- **community** of users and developers

<img src="general/figures/ecosystem.png" width="100" alt="mia logo" align="right" style="margin: 0 1em 0 1em" />


This section provides an overview of the package ecosystem. Section
\@ref(example-data) links to various open microbiome data resources
that support this framework.
Expand Down Expand Up @@ -63,50 +62,47 @@ devtools::install_github("microbiome/mia")

## Package ecosystem {#ecosystem}

Methods for the analysis and manipulation of
`(Tree)SummarizedExperiment` and `MultiAssayExperiment` data
containers are available through a number of R packages. Some of these
are listed below. If you know more tips on such packages, data
sources, or other resources, kindly [let us
know](https://microbiome.github.io) through the issues, pull requests,
or online channels.
Methods for `(Tree)SummarizedExperiment` and `MultiAssayExperiment`
data containers are provided by multiple independent developers
through R/Bioconductor packages. Some of these are listed below (tips
on new packages are [welcome](https://microbiome.github.io)).


### mia package family

### mia family of methods
The mia package family provides general methods for microbiome data wrangling, analysis and visualization.

- [mia](https://microbiome.github.io/mia/): Microbiome analysis tools [@R_mia]
- [miaViz](https://microbiome.github.io/miaViz/): Microbiome analysis specific visualization [@Ernst2022]
- [miaSim](https://microbiome.github.io/miaSim/): Microbiome data simulations [@Simsek2021]
- [miaTime](https://microbiome.github.io/miaTime/): Microbiome time series analysis [@Lahti2021]


### Tree-based methods {#sub-tree-methods}

- [philr](http://bioconductor.org/packages/devel/bioc/html/philr.html) (@Silverman2017)


### Differential abundance {#sub-diff-abund}

The following DA methods support `(Tree)SummarizedExperiment`.

- [ANCOMBC](https://bioconductor.org/packages/devel/bioc/html/ANCOMBC.html) for differential abundance analysis
- [benchdamic](https://bioconductor.org/packages/release/bioc/vignettes/benchdamic/inst/doc/intro.html) for benchmarking differential abundance methods
- [LinDA](https://cran.r-project.org/web/packages/MicrobiomeStat/) for differential abundance analysis
- [ZicoSeq](https://cran.r-project.org/web/packages/GUniFrac/) for differential abundance analysis
- [ALDEx2](https://www.bioconductor.org/packages/release/bioc/html/ALDEx2.html) for differential abundance analysis
- [phyloseq](https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html) for data preparation into phyloseq format for differential abundance analysis, such as ANCOMBC requires the input data is phyloseq format



### Manipulation {#sub-manipulation}

- [MicrobiotaProcess](https://bioconductor.org/packages/release/bioc/html/MicrobiotaProcess.html) for analyzing microbiome and other ecological data within the tidy framework


### Further options
### Other packages

- [philr](http://bioconductor.org/packages/devel/bioc/html/philr.html) (@Silverman2017) phylogeny-aware phILR transformation
- [MicrobiotaProcess](https://bioconductor.org/packages/release/bioc/html/MicrobiotaProcess.html) for "tidy" analysis of microbiome and other ecological data
- [Tools for Microbiome
Analysis](https://microsud.github.io/Tools-Microbiome-Analysis/)
site listed over 130 R packages for microbiome data science in
2023. Many of these are not in Bioconductor, or do not directly
support the data containers used in this book but can be used with
minor modifications.
support the data containers used in this book but can be often used
with minor modifications. The phyloseq-based tools can be used by
converting the TreeSE data into phyloseq with
`makePhyloseqFromTreeSummarizedExperiment`.


### Open microbiome data

Hundreds of published microbiome data sets are readily available in
these data containers (see \@ref(example-data)).

40 changes: 10 additions & 30 deletions 11_taxonomic_information.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ Here is an example that does a CLR transformation followed by the hierarchical
clustering algorithm.

First, we import the library `bluster` that simplifies the clustering.

```{r bluster_dependence}
library(bluster)
```
Expand All @@ -218,21 +219,23 @@ tse <- transformAssay(tse, assay.type = "clr", method = "z",
MARGIN = "features")

# Cluster (with euclidean distance) on the features of the z assay
tse <- cluster(tse, assay.type = "z",
clust.col = "hclustEuclidean", MARGIN = "features",
HclustParam(dist.fun = stats::dist, metric = "euclidean",
method = "ward.D2"))
tse <- cluster(tse,
assay.type = "z",
clust.col = "hclustEuclidean",
MARGIN = "features",
HclustParam(dist.fun = stats::dist, method = "ward.D2"))

# Declare the Kendall dissimilarity computation function
kendall_dissimilarity <- function(x) {
as.dist(1 - cor(t(x), method = "kendall"))
}

# Cluster (with Kendall dissimilarity) on the features of the z assay
tse <- cluster(tse, assay.type = "z", MARGIN = "features",
tse <- cluster(tse,
assay.type = "z",
clust.col = "hclustKendall",
HclustParam(method = "ward.D2",
dist.fun = kendall_dissimilarity))
MARGIN = "features",
HclustParam(dist.fun = kendall_dissimilarity, method = "ward.D2"))
```

Let us store the resulting cluster indices in the `rowData` column specified
Expand Down Expand Up @@ -312,26 +315,3 @@ head(assay(tse, "pa"))
assays(tse)
```

## Pick specific {#pick-specific}

Retrieving of specific elements that are required for specific analysis. For
instance, extracting abundances for a specific taxa in all samples or all taxa
in one sample.

### Abundances of all taxa in specific sample
```{r}
taxa.abund.cc1 <- getAbundanceSample(tse,
sample_id = "CC1",
assay.type = "counts")
taxa.abund.cc1[1:10]
```

### Abundances of specific taxa in all samples

```{r}
taxa.abundances <- getAbundanceFeature(tse,
feature_id = "Phylum:Bacteroidetes",
assay.type = "counts")
taxa.abundances[1:10]
```

2 changes: 2 additions & 0 deletions 30_differential_abundance.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -339,13 +339,15 @@ that we specify.


```{r ancombc2, warning = FALSE, eval=TRUE}

# Agglomerate data to genus level and add this new abundance table to the altExp slot
altExp(tse, "genus") <- agglomerateByRank(tse, "genus")

# Identify prevalent genera
prevalent.genera <- getPrevalentFeatures(altExp(tse, "genus"), detection = 0, prevalence = 30/100)

# Run ANCOM-BC at the genus level and only including the prevalent genera

out <- ancombc2(
data = altExp(tse, "genus")[prevalent.genera, ],
assay_name = "counts",
Expand Down
2 changes: 1 addition & 1 deletion 80_training.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ We encourage to familiarize with the material and test examples in advance:

* [Other outreach material](https://github.com/microbiome/outreach)

* [Orchestrating Microbiome Analysis with R/Bioconductor (OMA)](https://microbiome.github.io/OMA/) (this book)
* [Orchestrating Microbiome Analysis with Bioconductor (OMA)](https://microbiome.github.io/OMA/) (this book)

* [Exercises](#exercises) for self-study

Expand Down
66 changes: 42 additions & 24 deletions 90_acknowledgments.Rmd
Original file line number Diff line number Diff line change
@@ -1,38 +1,56 @@
# Authors and contributors {-}
# Developers {-}

```{r setup, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble()
```


### *Leo Lahti, DSc* {-}

Leo Lahti is professor in Data Science at the [Department of Computing, University of Turku, Finland](https://datascience.utu.fi/), with a focus on computational microbiome analysis. Lahti obtained doctoral degree (DSc) from Aalto University in Finland (2010), developing probabilistic machine learning and data integration methods for high-throughput life science data. Since 2011 he has carried out microbiome research and developed, among other things, the _phyloseq_-based [microbiome R package](https://bioconductor.org/packages/release/bioc/html/microbiome.html) before starting to develop the mia libraries and _TreeSummarizedExperiment_ / _MultiAssayExperiment_ framework for microbiome data science introduced in this gitbook. In addition to carrying out computational microbiome research, Lahti is in the editorial board of _ISME_ and _Microbiome_ journals, work group leader in the European COST action network [ML4microbiome](https://ml4microbiome.eu/), national delegate in the International Science Council Committee on Data ([CODATA](https://codata.org/)), and has led the development of [national policy on open access to research methods in Finland](https://avointiede.fi/en/policies-materials/policies-open-science-and-research-finland/policy-open-research-data-and-methods). He is current member in the [Bioconductor Community Advisory Board](https://bioconductor.org/about/community-advisory-board/) and runs regular training workshops in microbiome data science.


### *Tuomas Borman* {-}

Tuomas Borman is a PhD researcher at the Department of Computing, University of Turku, and one of the key developers of the microbiome data science framework presented in this gitbook. He has helped to set up the base ecosystem of R/Bioconductor packages and other online resources.


### *Giulio Benedetti* {-}

Giulio Benedetti is a scientific programmer at the Department of Computing, University of Turku. His research interest is mostly related to Data Science. He has also helped to expand the SummarizedExperiment-based microbiome analysis framework to the Julia language, implementing [MicrobiomeAnalysis.jl](https://github.com/JuliaTurkuDataScience/MicrobiomeAnalysis.jl).


### *Felix Ernst, PhD* {-}

Felix Ernst is among the first developers of R/Bioc methods for microbiome research based on the _SummarizedExperiment_ class and its derivatives.

### Core team {-}

Contributions to this Gitbook from the various developers are
coordinated by:

- *Leo Lahti, DSc*, professor in Data Science at the [Department of
Computing, University of Turku,
Finland](https://datascience.utu.fi/), with a focus on
computational microbiome analysis. Lahti obtained doctoral degree
(DSc) from Aalto University in Finland (2010), developing
probabilistic machine learning with applications to high-throughput
life science data integration. Since then he has focused on
microbiome research and developed, for instance, the
_phyloseq_-based [microbiome R
package](https://bioconductor.org/packages/release/bioc/html/microbiome.html)
before starting to develop the _TreeSummarizedExperiment_ /
_MultiAssayExperiment_ framework and the mia family of Bioconductor
packages for microbiome data science introduced in this
gitbook. Lahti led the development of [national policy on open
access to research methods in
Finland](https://avointiede.fi/en/policies-materials/policies-open-science-and-research-finland/policy-open-research-data-and-methods).
He is current member in the [Bioconductor Community Advisory
Board](https://bioconductor.org/about/community-advisory-board/)
and runs regular training workshops in microbiome data science.

- *Tuomas Borman*, PhD researcher and the lead developer of OMA/mia at
the Department of Computing, University of Turku.


### Contributors {-}

This work is a remarkably collaborative effort over the years. The
full list of contributors is available via
This work is a remarkably collaborative effort. The full list of
contributors is available via
[Github](https://github.com/microbiome/OMA/graphs/contributors). Some
of the key contributors include:
key authors/contributors include:

- *Felix Ernst, PhD*, among the first developers of R/Bioc methods for
microbiome research based on the _SummarizedExperiment_ class and
its derivatives.

- *Giulio Benedetti*, scientific programmer at the Department of
Computing, University of Turku. His research interest is mostly
related to Data Science. He has also helped to expand the
SummarizedExperiment-based microbiome analysis framework to the
Julia language, implementing
[MicrobiomeAnalysis.jl](https://github.com/JuliaTurkuDataScience/MicrobiomeAnalysis.jl).

- *Sudarshan Shetty, PhD* has supported the establishment of the
framework and associated tools. He also maintains a list of
Expand Down
5 changes: 3 additions & 2 deletions 98_exercises.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -251,8 +251,9 @@ got stuck, you can refer to chapter \@ref(assay-slot) of this book.
6. **Extra**: Create a taxonomy tree based on the taxonomy mappings with
`addTaxonomyTree` and display its content with `taxonomyTree` and `ggtree`.

If you got stuck, you can look up chapters \@ref(pick-specific) and \@ref(fly-tree)
on how to pick specific abundances and generate row trees, respectively.
If you got stuck, you can look up chapters \@fref{datamanipulation}
and \@ref(fly-tree) on how to pick specific abundances and generate
row trees, respectively.


### Other elements
Expand Down
19 changes: 8 additions & 11 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,25 +1,22 @@
Package: OMA
Title: Orchestrating Microbiome Analysis
Version: 0.98.15
Date: 2023-07-13
Title: Orchestrating Microbiome Analysis with Bioconductor
Version: 0.98.16
Date: 2023-07-29
Authors@R:
c(person("Leo", "Lahti", role = c("aut"),
comment = c(ORCID = "0000-0001-5537-637X")),
person(given = "Tuomas", family = "Borman", role = c("aut", "cre"),
email = "tuomas.v.borman@utu.fi",
comment = c(ORCID = "0000-0002-8563-8884")),
person(given = "Henrik", family = "Eckermann", role = c("ctb"),
comment = c(ORCID = "0000-0001-8725-7770")),
person("Sudarshan", "Shetty", email = "sudarshanshetty9@gmail.com",
role = c("aut"),
comment = c(ORCID = "0000-0001-7280-9915")),
person("Felix GM", "Ernst", email = "felix.gm.ernst@outlook.com",
role = c("aut"),
comment = c(ORCID = "0000-0001-5064-0928"))
comment = c(ORCID = "0000-0001-5064-0928")),
person("and others", "(see the full list of contributors)",
role = c("ctb"))
)
Description:
This is a reference cookbook for performing **Microbiome Analysis** with
Bioconductor in R.
This is a reference cookbook for **Microbiome Data Science** with
R and Bioconductor.
License: CC BY-NC-SA 3.0 US
Encoding: UTF-8
URL: https://github.com/microbiome/OMA
Expand Down
Loading
Loading