docs: review feedback, finish todos

adamrtalbot · May 14, 2024 · 52e4344 · 52e4344
1 parent 0788ef6
commit 52e4344
Show file tree

Hide file tree

Showing 18 changed files with 182 additions and 47 deletions.
diff --git a/demo/docs/assets/create-a-data-link.png b/demo/docs/assets/create-a-data-link.png
diff --git a/demo/docs/assets/data-explorer-add-bucket.gif b/demo/docs/assets/data-explorer-add-bucket.gif
diff --git a/demo/docs/assets/data-explorer-preview-files.gif b/demo/docs/assets/data-explorer-preview-files.gif
diff --git a/demo/docs/assets/data-explorer-view-details.gif b/demo/docs/assets/data-explorer-view-details.gif
diff --git a/demo/docs/assets/data-studio-create-jupyter.gif b/demo/docs/assets/data-studio-create-jupyter.gif
diff --git a/demo/docs/assets/data-studio-jupyter-notebook-example.png b/demo/docs/assets/data-studio-jupyter-notebook-example.png
diff --git a/demo/docs/assets/logo.svg b/demo/docs/assets/logo.svg
diff --git a/demo/docs/assets/seqera-biotech-stack.png b/demo/docs/assets/seqera-biotech-stack.png
diff --git a/demo/docs/assets/seqera-one-platform.png b/demo/docs/assets/seqera-one-platform.png
diff --git a/demo/docs/assets/user-settings.png b/demo/docs/assets/user-settings.png
diff --git a/demo/docs/data_explorer.md b/demo/docs/data_explorer.md
@@ -2,32 +2,41 @@
 
 With Data Explorer, you can browse and interact with remote data repositories from organization workspaces in Seqera Platform. It supports AWS S3, Azure Blob Storage, and Google Cloud Storage repositories.
 
-## 1. Data Explorer features
+## 1. View pipeline outputs in Data Explorer
 
-- View bucket details
-  To view bucket details such as the cloud provider, bucket address, and credentials, select the information icon next to a bucket in the Data Explorer list.
+In Data Explorer, you are able to:
 
-- Search and filter buckets
-  Search for buckets by name and region (e.g., `region:eu-west-2`) in the search field, and filter by provider.
+  - **View bucket details**:
+    The cloud provider, bucket address, and credentials, by selecting the information icon next to a bucket in the Data Explorer list.
 
-- Hide buckets from list view
-  Workspace maintainers can hide buckets from the Data Explorer list view. Select multiple buckets, then select Hide in the Data Explorer toolbar. To hide buckets individually, select Hide from the options menu of a bucket in the list.
+![Bucket details](assets/data-explorer-view-details.gif)
 
-The Data Explorer list filter defaults to Only visible. Select Only hidden or All from the filtering menu to view hidden buckets in the list. You can Unhide a bucket from its options menu in the list view.
+  - **View bucket contents**
+    Select a bucket name from the Data Explorer list to view the contents of that bucket. 
+
+    The file type, size, and path of objects are displayed in columns to the right of the object name. For example, we can take a look at the outputs of our nf-core/rnaseq run.
 
-- View bucket contents
-  Select a bucket name from the Data Explorer list to view the contents of that bucket. From the View cloud bucket page, you can browse directories and search for objects by name in a particular directory. The file type, size, and path of objects are displayed in columns to the right of the object name. To view bucket details such as the cloud provider, bucket address, and credentials, select the information icon.
+   ![Data Explorer bucket](assets/sp-cloud-data-explorer.gif)
 
-- Preview and download files
-  From the View cloud bucket page, you can preview and download files. Select the download icon in the Actions column to download a file directly from the list view. Select a file to open a preview window that includes a Download button.
+   - **Preview files**: 
+    Select a file to open a preview window that includes a Download button. For example, we can use Data Explorer to view the results of the nf-core/rnaseq pipeline that we executed. Specifically, we can take a look at the resultant gene counts of the salmon quantification step:
 
-## 2. View Run outputs in Data Explorer
+![Preview pipeline results](assets/data-explorer-preview-files.gif)
 
-Data Explorer can be used to view the outputs of your pipelines.
+## 2. Configure a bucket to browser in Data Explorer
+Data Explorer also enables you to add public cloud storage buckets to view and use data from resources such as:
 
-From the View cloud bucket page, you can:
+-  [The Cancer Genome Atlas (TCGA)](https://registry.opendata.aws/tcga/)
+- [1000 Genomes Project](https://registry.opendata.aws/1000-genomes/)
+- [NCBI SRA](https://registry.opendata.aws/ncbi-sra/)
+- [Genome in a Bottle Consortium](https://docs.opendata.aws/giab/readme.html)
+- [MSSNG Database](https://cloud.google.com/life-sciences/docs/resources/public-datasets/mssng)
+- [Genome Aggregation Database (gnomAD)](https://cloud.google.com/life-sciences/docs/resources/public-datasets/gnomad) 
 
-1. Preview and download files: Select the download icon in the 'Actions' column to download a file directly from the list view. Select a file to open a preview window that includes a Download button.
-2. Copy bucket/object paths: Select the Path of an object on the cloud bucket page to copy its absolute path to the clipboard. Use these object paths to specify input data locations during pipeline launch, or add them to a dataset for pipeline input.
+Select 'Add cloud bucket' from the Data Explorer tab to add individual buckets (or directory paths within buckets). 
 
-![Data Explorer bucket](assets/sp-cloud-data-explorer.gif)
+Specify the Provider, Bucket path, Name, Credentials, and Description, then select Add. For public cloud buckets, select Public from the Credentials drop-down menu.
+
+![Add public bucket](assets/data-explorer-add-bucket.gif)
+
+You are now able to use this data in your analysis without having to interact with Cloud consoles or CLI tools. 
diff --git a/demo/docs/data_studios.md b/demo/docs/data_studios.md
@@ -1,18 +1,18 @@
-Data Studios is a unified platform where you can perform analysis of your pipeline results after successful execution. It allows you to host a combination of images and compute environments for interactive analysis using your preferred tools, like Jupyter notebooks, RStudio, and Visual Studio Code IDEs. Each data studio session is an individual interactive environment that encapsulates the live environment for dynamic data analysis.
+Data Studios is a unified platform where you can perform analysis of your pipeline results after successful execution. 
 
-<!--
-TODO: update gifs with showcase data studios eventually
-TODO: add custom datalink for outdir from nf-core/rnaseq results to mount here
-TODO: show example of using jupyter or rstudio with nf-core/rnaseq results
- -->
+It allows you to host a combination of images and compute environments for interactive analysis using your preferred tools, like Jupyter notebooks, RStudio, and Visual Studio Code IDEs. 
+
+Each data studio session is an individual interactive environment that encapsulates the live environment for dynamic data analysis.
+
+## Data Studio Setup
 
 ### Create a Data Studio
 
-#### 1. Create a Data Studio
+#### 1. Add a Data Studio
 
 To create a Data Studio, click on the 'Add data studio' button and select from any one of the three currently available templates.
 
-![Add a data studio](./assets/create-data-studio.gif)
+![Add a data studio](assets/create-data-studio.gif)
 
 #### 2. Select a compute environment
 
@@ -24,7 +24,7 @@ Select data to mount into your data studios environment using the Fusion file sy
 
 For example, to take a look at the results of your nf-core/rnaseq pipeline run, you can mount the value of the `outdir` parameter specified in the [earlier step when launching the pipeline](./launch_pipeline.md).
 
-![Mount data into studio](./assets/mount-data-into-studio.gif)
+![Mount data into studio](assets/mount-data-into-studio.gif)
 
 #### 4. Resources for environment
 
@@ -34,21 +34,63 @@ Then, click Add!
 
 The data studio environment will be available in the Data Studios landing page with the status 'stopped'. Click on the three dots and **Start** to begin running the studio.
 
-![Start a studio](./assets/start-studio.gif)
+![Start a studio](assets/start-studio.gif)
 
-![Connect to a studio](./assets/connect-to-studio.png){ .right .image}
+![Connect to a studio](assets/connect-to-studio.png){ .right .image}
 
 ### Connect to a Data Studio
 
 To connect to a running data studio session, select the three dots next to the status message and choose **Connect**. A new browser tab will open, displaying the status of the data studio session. Select **Connect**.
 <br>
+<div style="clear: both;"></div>
 
 ### Collaborate in Data Studio
 
-Collaborators can also join a data studios session in your workspace. For example, to share the results of the nf-core/rnaseq pipeline, you can share a link by selecting the three dots next to the status message for the data studio you want to share, then select **Copy data studio URL**. Using this link other authenticated users can access the session directly.
-
-![Stop a studio session](./assets/stop-a-studio.png){ .right .image}
+Collaborators can also join a data studios session in your workspace. For example, to share the results of the nf-core/rnaseq pipeline, you can share a link by selecting the three dots next to the status message for the data studio you want to share, then select **Copy data studio URL**. Using this link other authenticated users with the "Connect" role at minimum, can access the session directly.
+<div style="clear: both;"></div>
 
+![Stop a studio session](assets/stop-a-studio.png){ .right .image}
 ### Stop a Data Studio
 
-To stop a running session, click on the three dots next to the status and select **Stop**. Any unsaved analyses or results will be lost.
+To stop a running session, click on the three dots next to the status and select **Stop**. Any unsaved analyses or results will be lost.<br>
+<div style="clear: both;"></div>
+
+<br>
+## Analyse RNAseq data in a Data Studio
+
+Data Studio can be used to perform tertiary analysis of data generated by Nextflow pipeline executions on Seqera Platform. For example, we can take a look at our nf-core/rnaseq pipeline results in a Jupyter notebook to perform additional interactive analyses.
+
+### 1. Create a Data Link
+To enable access to our RNAseq analysis data in a Studio, we can create a custom data link pointing to the directory in our AWS S3 bucket where the results are saved. 
+
+This can be achieved by using the 'Add cloud bucket' button in Data Explorer and specifying the path to our output directory:
+
+![Stop a studio session](assets/create-a-data-link.png){ .center }
+
+
+### 2. Create a Jupyter notebook session 
+When creating our Data Studio, we can mount our newly created Data Link to isolate read/write access to this directory within the studio session.
+
+![Jupyter notebook studio](assets/data-studio-create-jupyter.gif)
+
+### 3. Data exploration in Jupyter
+Once created, we can Connect to our Data Studio to open a Jupyter notebook session where we can take a look at the results of our RNAseq analysis. 
+
+For example, in the notebook, you may first want to import Python libraries:
+
+```python
+import pandas as pd
+```
+
+We can load in our data from the analyses. For example, as a start, lets take a look at our transcript counts across the samples when loaded into a Pandas dataframe:
+
+```python
+data = pd.read_csv('data/seqeralabs-showcase-rnaseq-results/star_salmon/salmon.merged.gene_counts.tsv', sep='\t', index_col=0)
+print(data.head())
+```
+
+![Jupyter notebook](assets/data-studio-jupyter-notebook-example.png)
+
+
+Through Data Studios, you are now able to continue into the next step of your tertiary analyses, using data generated from pipelines executed on Seqera Platform but stored in the Cloud - without having to ever leave the Platform.
+
diff --git a/demo/docs/demo_overview.md b/demo/docs/demo_overview.md
@@ -2,13 +2,13 @@
 
 Log into Seqera Platform, either through a GitHub account, Google account, or an email address.
 
-If an email address is provided, Seqera Cloud will send an authentication link to the email address to login with.
+Upon providing an email address, Seqera Cloud will send an authentication link, enabling login.
 
 ![Seqera Platform Cloud login](assets/sp-cloud-signin.gif)
 
 ### 2. Navigate into the seqeralabs/showcase Workspace
 
-All resources in Seqera Platform live inside a Workspace, which in turn belong to an Organisation. Typically, teams of colleagues or collaborators will share one or more workspaces. All resources in a Workspace (i.e. pipelines, compute environments, datasets) are shared by members of that workspace.
+All resources in Seqera Platform live inside a Workspace, which in turn belong to an Organization. Typically, teams of colleagues or collaborators will share one or more workspaces. All resources in a Workspace (i.e. pipelines, compute environments, datasets) are shared by members of that workspace.
 
 Navigate into the `seqeralabs/showcase` Workspace.
 

diff --git a/demo/docs/index.md b/demo/docs/index.md
@@ -1,28 +1,48 @@
 # Seqera Platform: Demonstration Walkthrough
 
-## Walkthrough of [Seqera Platform](https://seqera.io/)
 
-![](assets/landing_page.png){ .right .image}
+<div style="display: flex; align-items: center; margin-bottom: 20px;">
+  <div style="margin-right: 10px;">
+    <a href="https://cloud.seqera.io/login" class="md-button" style="display: block; margin-bottom: 10px;">
+      <i class="fas fa-user"></i> Login to Seqera Platform
+    </a>
+    <a href="https://seqera.io" class="md-button" style="display: block;">
+      Visit Seqera Main Site
+    </a>
+  </div>
+    <div style="flex: 1; margin-left: 200px;">
+    <img src="assets/seqera-one-platform.png" alt="Seqera Biotech Stack" style="width: 100%; max-width: 750px;">
+  </div>
+</div>
+
 
-## [:fontawesome-solid-user: Login to Seqera Platform](https://tower.nf/login){ .md-button }
 
----
 
+---
 ## Overview
 
+<!-- ![Seqera biotech stack](assets/seqera-biotech-stack.png){ .right .image} -->
+<img src="assets/seqera-biotech-stack.png" alt="Seqera biotech stack" style="float: right; width: 50%; margin-left: 30px; margin-bottom: 20px;">
+
 This guide provides a walkthrough of a standard Seqera Platform demonstration. The demonstration will describe how to add a pipeline to the Launchpad, launch a workflow with pipeline parameters, monitor a Run, and examine the run details in several different parts. The demonstration will also highlight key features such as the Pipeline Optimization, Data Explorer, and Compute Environment creation.
 
 More specifically, this demonstration will focus on using the [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline as an example and executing the workflow on AWS Batch.
 
+<div style="clear: both;"></div>
+
 ---
 
 ## Requirements
 
-- A [Seqera Platform Cloud](https://seqera.io/login) account
-- Access to a Workspace in Seqera Platform
-- :fontawesome-brands-aws: An [AWS Batch Compute Environment created in that Workspace](https://docs.seqera.io/platform/23.3.0/compute-envs/aws-batch)
-- The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline repository
-- Samplesheet to create a Dataset on the Platform used to run minimal test RNAseq data (see [samplesheet_test.csv](./samplesheet_test.csv) file in this repository)
+:octicons-checkbox-16: A [Seqera Platform Cloud](https://cloud.seqera.io/login) account
+
+:octicons-checkbox-16: Access to a Workspace in Seqera Platform
+
+:octicons-checkbox-16: :fontawesome-brands-aws: An [AWS Batch Compute Environment created in that Workspace](https://docs.seqera.io/platform/23.3.0/compute-envs/aws-batch)
+
+:octicons-checkbox-16: The [nf-core/rnaseq](https://github.com/nf-core/rnaseq) pipeline repository
+
+:octicons-checkbox-16: Samplesheet to create a Dataset on the Platform used to run minimal test RNAseq data (see [samplesheet_test.csv](./samplesheet_test.csv) file in this repository)
 
 ---
 
@@ -40,3 +60,5 @@ More specifically, this demonstration will focus on using the [nf-core/rnaseq](h
 [:material-check-circle:]() [Data Studios](./data_studios.md) <br/>
 [:material-check-circle:]() [Optimize your Pipeline](./pipeline_optimization.md) <br/>
 [:material-check-circle:]() [Automation](./automation.md) <br/>
+[:material-check-circle:]() [Scaling Science on Seqera Platform](./summary.md) <br/>
+
diff --git a/demo/docs/launch_pipeline.md b/demo/docs/launch_pipeline.md
@@ -2,7 +2,7 @@
 
 ## 1. Go to Launchpad
 
-Navigate back to the Launchpad to being executing the newly added nf-core/rnaseq pipeline.
+Navigate back to the Launchpad to begin executing the newly added nf-core/rnaseq pipeline.
 
 Select 'Launch' next to the pipeline of your choice to open the pipeline launch form.
 
@@ -18,7 +18,7 @@ All pipelines contain at least these parameters:
 
 **2. Labels:** Assign new or existing labels to the run.
 
-**3. Input/output options:** Specify paths to pipeline input datasets, output directories, and other pipeline-specific I/O options. input and outdir are required fields common to all pipelines:
+**3. Input/output options:** Specify paths to pipeline input datasets, output directories, and other pipeline-specific I/O options. Input and outdir are required fields common to all pipelines:
 
 For the 'input' parameter, click on the text box and click on the name of the dataset added in the previous step.
 

diff --git a/demo/docs/pipeline_optimization.md b/demo/docs/pipeline_optimization.md
@@ -2,7 +2,14 @@
 
 Seqera's pipeline optimization feature uses resource usage information from previous runs to minimize the resources used in your pipeline runs.
 
-When a run completes successfully, an optimized profile is created. This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): cpus, memory, and time. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.
+Optimization is available for pipelines once 1 successful run has been completed. This will be indicated by the grey-lightbulb icon turning into a black-hashed lightbulb that will allow you to view the optimized profile. 
+
+This profile consists of Nextflow configuration settings for each process and each resource directive (where applicable): cpus, memory, and time. The optimized setting for a given process and resource directive is based on the maximum use of that resource across all tasks in that process.
+
+Once optimization is selected, any subsequent runs of that pipeline on the Launchpad will inherit the optimized configuration profile, indicated by the black lightbulb icon with a checkmark. 
+
+> **NOTE:** Optimizated profiles are generated off of one run at a time, defaulting to the most recent runs, and _not_ an aggregation of previous runs.
+
 
 Navigate back to the Launchpad, click on the nf-core/rnaseq Pipeline added, and click on the 'Lightbulb' icon to view the optimized profile.
 

diff --git a/demo/docs/resources.md b/demo/docs/resources.md
@@ -0,0 +1,17 @@
+# Resources
+
+## Seqera
+
+:material-office-building-outline: [About us](https://seqera.io/about/)
+
+## Seqera Platform
+:material-file-document-edit: [Seqera Platform](https://docs.seqera.io/platform/)
+
+:material-file-document-multiple: [Nextflow documentation](https://www.nextflow.io/docs/latest/)
+
+## Blog Posts
+:material-transmission-tower: [Best Practices for Deploying Pipelines with the Seqera Platform (formerly Nextflow Tower)](https://seqera.io/blog/best-practices-for-deploying-pipelines-with-seqera-platform/)
+
+:material-folder-star-multiple:  [Breakthrough performance and cost-efficiency with the new Fusion file system](https://seqera.io/blog/breakthrough-performance-and-cost-efficiency-with-the-new-fusion-file-system/)
+
+:material-auto-mode: [Workflow Automation for Nextflow Pipelines](https://seqera.io/blog/workflow-automation/)
diff --git a/demo/docs/summary.md b/demo/docs/summary.md
@@ -0,0 +1,21 @@
+# Seqera Platform 
+## One platform for the scientific data analysis life cycle
+
+Throughout this guide, you have experienced how the Seqera Platform streamlines the management, execution, monitoring, and analysis of Nextflow pipelines in the cloud. This centralized and intuitive interface offers numerous advantages:
+
+:material-check: **Ease of Access**: Enables all users to execute Nextflow pipelines with ease.
+
+:material-check: **Simplified Cloud Deployment**: Allows for the deployment of pipelines on the cloud without needing to understand the underlying infrastructure.
+
+:material-check: **Real-Time Monitoring**: Provides the ability to view the progress and outcomes of pipeline runs directly, bypassing the need for direct access to the execution environment.
+
+:material-check: **Enhanced Provenance Tracking**: Facilitates the logging and tracking of pipeline provenance, enhancing reproducibility.
+
+:material-check: **Cloud Data Interaction**: Supports seamless interaction with cloud-stored data, eliminating the need for direct cloud console or CLI interactions.
+
+:material-check: **Automated Resource Management**: Reduces manual resource tuning, preventing allocation errors and optimizing task execution.
+
+:material-check: **Collaborative Efficiency**: Boosts productivity by enabling researchers to share, collaborate, and interpret results effortlessly, without additional infrastructure overhead.
+
+Seqera Platform empowers scientists to conduct high-throughput computing on a large scale, utilizing modern software engineering practices, all from a single, unified location. This guide has outlined how leveraging these capabilities can transform your research productivity and computational efficiency.
+