Skip to content

Commit

Permalink
docs: review feedback, finish todos
Browse files Browse the repository at this point in the history
  • Loading branch information
ejseqera committed May 14, 2024
1 parent 0788ef6 commit 52e4344
Show file tree
Hide file tree
Showing 18 changed files with 182 additions and 47 deletions.
Binary file added demo/docs/assets/create-a-data-link.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/docs/assets/data-explorer-add-bucket.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/docs/assets/data-explorer-preview-files.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/docs/assets/data-explorer-view-details.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/docs/assets/data-studio-create-jupyter.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions demo/docs/assets/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/docs/assets/seqera-biotech-stack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/docs/assets/seqera-one-platform.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified demo/docs/assets/user-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 27 additions & 18 deletions demo/docs/data_explorer.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,41 @@

With Data Explorer, you can browse and interact with remote data repositories from organization workspaces in Seqera Platform. It supports AWS S3, Azure Blob Storage, and Google Cloud Storage repositories.

## 1. Data Explorer features
## 1. View pipeline outputs in Data Explorer

- View bucket details
To view bucket details such as the cloud provider, bucket address, and credentials, select the information icon next to a bucket in the Data Explorer list.
In Data Explorer, you are able to:

- Search and filter buckets
Search for buckets by name and region (e.g., `region:eu-west-2`) in the search field, and filter by provider.
- **View bucket details**:
The cloud provider, bucket address, and credentials, by selecting the information icon next to a bucket in the Data Explorer list.

- Hide buckets from list view
Workspace maintainers can hide buckets from the Data Explorer list view. Select multiple buckets, then select Hide in the Data Explorer toolbar. To hide buckets individually, select Hide from the options menu of a bucket in the list.
![Bucket details](assets/data-explorer-view-details.gif)

The Data Explorer list filter defaults to Only visible. Select Only hidden or All from the filtering menu to view hidden buckets in the list. You can Unhide a bucket from its options menu in the list view.
- **View bucket contents**
Select a bucket name from the Data Explorer list to view the contents of that bucket.

The file type, size, and path of objects are displayed in columns to the right of the object name. For example, we can take a look at the outputs of our nf-core/rnaseq run.

- View bucket contents
Select a bucket name from the Data Explorer list to view the contents of that bucket. From the View cloud bucket page, you can browse directories and search for objects by name in a particular directory. The file type, size, and path of objects are displayed in columns to the right of the object name. To view bucket details such as the cloud provider, bucket address, and credentials, select the information icon.
![Data Explorer bucket](assets/sp-cloud-data-explorer.gif)

- Preview and download files
From the View cloud bucket page, you can preview and download files. Select the download icon in the Actions column to download a file directly from the list view. Select a file to open a preview window that includes a Download button.
- **Preview files**:
Select a file to open a preview window that includes a Download button. For example, we can use Data Explorer to view the results of the nf-core/rnaseq pipeline that we executed. Specifically, we can take a look at the resultant gene counts of the salmon quantification step:

## 2. View Run outputs in Data Explorer
![Preview pipeline results](assets/data-explorer-preview-files.gif)

Data Explorer can be used to view the outputs of your pipelines.
## 2. Configure a bucket to browser in Data Explorer
Data Explorer also enables you to add public cloud storage buckets to view and use data from resources such as:

From the View cloud bucket page, you can:
- [The Cancer Genome Atlas (TCGA)](https://registry.opendata.aws/tcga/)
- [1000 Genomes Project](https://registry.opendata.aws/1000-genomes/)
- [NCBI SRA](https://registry.opendata.aws/ncbi-sra/)
- [Genome in a Bottle Consortium](https://docs.opendata.aws/giab/readme.html)
- [MSSNG Database](https://cloud.google.com/life-sciences/docs/resources/public-datasets/mssng)
- [Genome Aggregation Database (gnomAD)](https://cloud.google.com/life-sciences/docs/resources/public-datasets/gnomad)

1. Preview and download files: Select the download icon in the 'Actions' column to download a file directly from the list view. Select a file to open a preview window that includes a Download button.
2. Copy bucket/object paths: Select the Path of an object on the cloud bucket page to copy its absolute path to the clipboard. Use these object paths to specify input data locations during pipeline launch, or add them to a dataset for pipeline input.
Select 'Add cloud bucket' from the Data Explorer tab to add individual buckets (or directory paths within buckets).

![Data Explorer bucket](assets/sp-cloud-data-explorer.gif)
Specify the Provider, Bucket path, Name, Credentials, and Description, then select Add. For public cloud buckets, select Public from the Credentials drop-down menu.

![Add public bucket](assets/data-explorer-add-bucket.gif)

You are now able to use this data in your analysis without having to interact with Cloud consoles or CLI tools.
72 changes: 57 additions & 15 deletions demo/docs/data_studios.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
Data Studios is a unified platform where you can perform analysis of your pipeline results after successful execution. It allows you to host a combination of images and compute environments for interactive analysis using your preferred tools, like Jupyter notebooks, RStudio, and Visual Studio Code IDEs. Each data studio session is an individual interactive environment that encapsulates the live environment for dynamic data analysis.
Data Studios is a unified platform where you can perform analysis of your pipeline results after successful execution.

<!--
TODO: update gifs with showcase data studios eventually
TODO: add custom datalink for outdir from nf-core/rnaseq results to mount here
TODO: show example of using jupyter or rstudio with nf-core/rnaseq results
-->
It allows you to host a combination of images and compute environments for interactive analysis using your preferred tools, like Jupyter notebooks, RStudio, and Visual Studio Code IDEs.

Each data studio session is an individual interactive environment that encapsulates the live environment for dynamic data analysis.

## Data Studio Setup

### Create a Data Studio

#### 1. Create a Data Studio
#### 1. Add a Data Studio

To create a Data Studio, click on the 'Add data studio' button and select from any one of the three currently available templates.

![Add a data studio](./assets/create-data-studio.gif)
![Add a data studio](assets/create-data-studio.gif)

#### 2. Select a compute environment

Expand All @@ -24,7 +24,7 @@ Select data to mount into your data studios environment using the Fusion file sy

For example, to take a look at the results of your nf-core/rnaseq pipeline run, you can mount the value of the `outdir` parameter specified in the [earlier step when launching the pipeline](./launch_pipeline.md).

![Mount data into studio](./assets/mount-data-into-studio.gif)
![Mount data into studio](assets/mount-data-into-studio.gif)

#### 4. Resources for environment

Expand All @@ -34,21 +34,63 @@ Then, click Add!

The data studio environment will be available in the Data Studios landing page with the status 'stopped'. Click on the three dots and **Start** to begin running the studio.

![Start a studio](./assets/start-studio.gif)
![Start a studio](assets/start-studio.gif)

![Connect to a studio](./assets/connect-to-studio.png){ .right .image}
![Connect to a studio](assets/connect-to-studio.png){ .right .image}

### Connect to a Data Studio

To connect to a running data studio session, select the three dots next to the status message and choose **Connect**. A new browser tab will open, displaying the status of the data studio session. Select **Connect**.
<br>
<div style="clear: both;"></div>

### Collaborate in Data Studio

Collaborators can also join a data studios session in your workspace. For example, to share the results of the nf-core/rnaseq pipeline, you can share a link by selecting the three dots next to the status message for the data studio you want to share, then select **Copy data studio URL**. Using this link other authenticated users can access the session directly.

![Stop a studio session](./assets/stop-a-studio.png){ .right .image}
Collaborators can also join a data studios session in your workspace. For example, to share the results of the nf-core/rnaseq pipeline, you can share a link by selecting the three dots next to the status message for the data studio you want to share, then select **Copy data studio URL**. Using this link other authenticated users with the "Connect" role at minimum, can access the session directly.
<div style="clear: both;"></div>

![Stop a studio session](assets/stop-a-studio.png){ .right .image}
### Stop a Data Studio

To stop a running session, click on the three dots next to the status and select **Stop**. Any unsaved analyses or results will be lost.
To stop a running session, click on the three dots next to the status and select **Stop**. Any unsaved analyses or results will be lost.<br>
<div style="clear: both;"></div>

<br>
## Analyse RNAseq data in a Data Studio

Data Studio can be used to perform tertiary analysis of data generated by Nextflow pipeline executions on Seqera Platform. For example, we can take a look at our nf-core/rnaseq pipeline results in a Jupyter notebook to perform additional interactive analyses.

### 1. Create a Data Link
To enable access to our RNAseq analysis data in a Studio, we can create a custom data link pointing to the directory in our AWS S3 bucket where the results are saved.

This can be achieved by using the 'Add cloud bucket' button in Data Explorer and specifying the path to our output directory:

![Stop a studio session](assets/create-a-data-link.png){ .center }


### 2. Create a Jupyter notebook session
When creating our Data Studio, we can mount our newly created Data Link to isolate read/write access to this directory within the studio session.

![Jupyter notebook studio](assets/data-studio-create-jupyter.gif)

### 3. Data exploration in Jupyter
Once created, we can Connect to our Data Studio to open a Jupyter notebook session where we can take a look at the results of our RNAseq analysis.

For example, in the notebook, you may first want to import Python libraries:

```python
import pandas as pd
```

We can load in our data from the analyses. For example, as a start, lets take a look at our transcript counts across the samples when loaded into a Pandas dataframe:

```python
data = pd.read_csv('data/seqeralabs-showcase-rnaseq-results/star_salmon/salmon.merged.gene_counts.tsv', sep='\t', index_col=0)
print(data.head())
```

![Jupyter notebook](assets/data-studio-jupyter-notebook-example.png)


Through Data Studios, you are now able to continue into the next step of your tertiary analyses, using data generated from pipelines executed on Seqera Platform but stored in the Cloud - without having to ever leave the Platform.

4 changes: 2 additions & 2 deletions demo/docs/demo_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Log into Seqera Platform, either through a GitHub account, Google account, or an email address.

If an email address is provided, Seqera Cloud will send an authentication link to the email address to login with.
Upon providing an email address, Seqera Cloud will send an authentication link, enabling login.

![Seqera Platform Cloud login](assets/sp-cloud-signin.gif)

### 2. Navigate into the seqeralabs/showcase Workspace

All resources in Seqera Platform live inside a Workspace, which in turn belong to an Organisation. Typically, teams of colleagues or collaborators will share one or more workspaces. All resources in a Workspace (i.e. pipelines, compute environments, datasets) are shared by members of that workspace.
All resources in Seqera Platform live inside a Workspace, which in turn belong to an Organization. Typically, teams of colleagues or collaborators will share one or more workspaces. All resources in a Workspace (i.e. pipelines, compute environments, datasets) are shared by members of that workspace.

Navigate into the `seqeralabs/showcase` Workspace.

Expand Down
40 changes: 31 additions & 9 deletions demo/docs/index.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,48 @@
# Seqera Platform: Demonstration Walkthrough

## Walkthrough of [Seqera Platform](https://seqera.io/)

![](assets/landing_page.png){ .right .image}
<div style="display: flex; align-items: center; margin-bottom: 20px;">
<div style="margin-right: 10px;">
<a href="https://cloud.seqera.io/login" class="md-button" style="display: block; margin-bottom: 10px;">
<i class="fas fa-user"></i> Login to Seqera Platform
</a>
<a href="https://seqera.io" class="md-button" style="display: block;">
Visit Seqera Main Site
</a>
</div>
<div style="flex: 1; margin-left: 200px;">
<img src="assets/seqera-one-platform.png" alt="Seqera Biotech Stack" style="width: 100%; max-width: 750px;">
</div>
</div>


## [:fontawesome-solid-user: Login to Seqera Platform](https://tower.nf/login){ .md-button }

---

---
## Overview

<!-- ![Seqera biotech stack](assets/seqera-biotech-stack.png){ .right .image} -->
<img src="assets/seqera-biotech-stack.png" alt="Seqera biotech stack" style="float: right; width: 50%; margin-left: 30px; margin-bottom: 20px;">

This guide provides a walkthrough of a standard Seqera Platform demonstration. The demonstration will describe how to add a pipeline to the Launchpad, launch a workflow with pipeline parameters, monitor a Run, and examine the run details in several different parts. The demonstration will also highlight key features such as the Pipeline Optimization, Data Explorer, and Compute Environment creation.

More specifically, this demonstration will focus on using the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline as an example and executing the workflow on AWS Batch.

<div style="clear: both;"></div>

---

## Requirements

- A [Seqera Platform Cloud](https://seqera.io/login) account
- Access to a Workspace in Seqera Platform
- :fontawesome-brands-aws: An [AWS Batch Compute Environment created in that Workspace](https://docs.seqera.io/platform/23.3.0/compute-envs/aws-batch)
- The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline repository
- Samplesheet to create a Dataset on the Platform used to run minimal test RNAseq data (see [samplesheet_test.csv](./samplesheet_test.csv) file in this repository)
:octicons-checkbox-16: A [Seqera Platform Cloud](https://cloud.seqera.io/login) account

:octicons-checkbox-16: Access to a Workspace in Seqera Platform

:octicons-checkbox-16: :fontawesome-brands-aws: An [AWS Batch Compute Environment created in that Workspace](https://docs.seqera.io/platform/23.3.0/compute-envs/aws-batch)

:octicons-checkbox-16: The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline repository

:octicons-checkbox-16: Samplesheet to create a Dataset on the Platform used to run minimal test RNAseq data (see [samplesheet_test.csv](./samplesheet_test.csv) file in this repository)

---

Expand All @@ -40,3 +60,5 @@ More specifically, this demonstration will focus on using the [nf-core/rnaseq](h
[:material-check-circle:]() [Data Studios](./data_studios.md) <br/>
[:material-check-circle:]() [Optimize your Pipeline](./pipeline_optimization.md) <br/>
[:material-check-circle:]() [Automation](./automation.md) <br/>
[:material-check-circle:]() [Scaling Science on Seqera Platform](./summary.md) <br/>

4 changes: 2 additions & 2 deletions demo/docs/launch_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## 1. Go to Launchpad

Navigate back to the Launchpad to being executing the newly added nf-core/rnaseq pipeline.
Navigate back to the Launchpad to begin executing the newly added nf-core/rnaseq pipeline.

Select 'Launch' next to the pipeline of your choice to open the pipeline launch form.

Expand All @@ -18,7 +18,7 @@ All pipelines contain at least these parameters:

**2. Labels:** Assign new or existing labels to the run.

**3. Input/output options:** Specify paths to pipeline input datasets, output directories, and other pipeline-specific I/O options. input and outdir are required fields common to all pipelines:
**3. Input/output options:** Specify paths to pipeline input datasets, output directories, and other pipeline-specific I/O options. Input and outdir are required fields common to all pipelines:

For the 'input' parameter, click on the text box and click on the name of the dataset added in the previous step.

Expand Down
9 changes: 8 additions & 1 deletion demo/docs/pipeline_optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@

Seqera's pipeline optimization feature uses resource usage information from previous runs to minimize the resources used in your pipeline runs.

When a run completes successfully, an optimized profile is created. This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): cpus, memory, and time. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.
Optimization is available for pipelines once 1 successful run has been completed. This will be indicated by the grey-lightbulb icon turning into a black-hashed lightbulb that will allow you to view the optimized profile.

This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): cpus, memory, and time. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.

Once optimization is selected, any subsequent runs of that pipeline on the Launchpad will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark.

> **NOTE:** Optimizated profiles are generated off of one run at a time, defaulting to the most recent runs, and _not_ an aggregation of previous runs.

Navigate back to the Launchpad, click on the nf-core/rnaseq Pipeline added, and click on the 'Lightbulb' icon to view the optimized profile.

Expand Down
17 changes: 17 additions & 0 deletions demo/docs/resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Resources

## Seqera

:material-office-building-outline: [About us](https://seqera.io/about/)

## Seqera Platform
:material-file-document-edit: [Seqera Platform](https://docs.seqera.io/platform/)

:material-file-document-multiple: [Nextflow documentation](https://www.nextflow.io/docs/latest/)

## Blog Posts
:material-transmission-tower: [Best Practices for Deploying Pipelines with the Seqera Platform (formerly Nextflow Tower)](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/)

:material-folder-star-multiple: [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/)

:material-auto-mode: [Workflow Automation for Nextflow Pipelines](https://seqera.io/blog/workflow-automation/)
21 changes: 21 additions & 0 deletions demo/docs/summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Seqera Platform
## One platform for the scientific data analysis life cycle

Throughout this guide, you have experienced how the Seqera Platform streamlines the management, execution, monitoring, and analysis of Nextflow pipelines in the cloud. This centralized and intuitive interface offers numerous advantages:

:material-check: **Ease of Access**: Enables all users to execute Nextflow pipelines with ease.

:material-check: **Simplified Cloud Deployment**: Allows for the deployment of pipelines on the cloud without needing to understand the underlying infrastructure.

:material-check: **Real-Time Monitoring**: Provides the ability to view the progress and outcomes of pipeline runs directly, bypassing the need for direct access to the execution environment.

:material-check: **Enhanced Provenance Tracking**: Facilitates the logging and tracking of pipeline provenance, enhancing reproducibility.

:material-check: **Cloud Data Interaction**: Supports seamless interaction with cloud-stored data, eliminating the need for direct cloud console or CLI interactions.

:material-check: **Automated Resource Management**: Reduces manual resource tuning, preventing allocation errors and optimizing task execution.

:material-check: **Collaborative Efficiency**: Boosts productivity by enabling researchers to share, collaborate, and interpret results effortlessly, without additional infrastructure overhead.

Seqera Platform empowers scientists to conduct high-throughput computing on a large scale, utilizing modern software engineering practices, all from a single, unified location. This guide has outlined how leveraging these capabilities can transform your research productivity and computational efficiency.

0 comments on commit 52e4344

Please sign in to comment.