Serverless scRNA Pipeline

Description

This pipeline processes single-cell RNA sequencing (scRNA-seq) data using AWS services for scalable computation. It leverages Piscem for efficient read mapping to a reference genome and Alevin-Fry for generating quantification matrices.

Usage

Create a m6id.16xlarge Ubuntu 22.04 EC2 Instance with a root volume of 500GB.

SSH into the EC2 instance

Clone this repository

# Navigate to main folder
cd refactored-serverless-pipeline

# install necessary dependencies
cd install_scripts
./install.sh
./install_scripts.sh
./install_alevin_fry.sh
./install_radtk.sh

# configure AWS CLI
aws configure

# copy credentials file to scrna-pipeline folder
cp ~/.aws/credentials ~/refactored-serverless-pipeline/scrna-pipeline/

#copy reference indexed transcriptome to scrna-pipleine folder
cd ~/refactored-serverless-pipeline/scrna-pipeline
aws s3 cp s3://nnasam-reference-index . --recursive


aws s3 cp s3://nnasam-pipeline/reference ./recursive --recursive

Configuration Parameters

Below parameters need to be changed in the run_scripts.sh file before triggering the pipeline

AWS_REGION: your aws account region
AWS_ACCOUNT_ID: your aws account ID
S3_INPUT_BUCKET_NAME: S3 bucket containing input fastq.gz files
S3_INPUT_TXT_BUCKET_NAME: Temp S3 bucket to hold input.txt files. This bucket will be created during pipeline execution.
S3_OUTPUT_BUCKET_NAME: Temp S3 bucket to hold individual output folders. This bucket will be created during pipeline execution.
SCRNA_OUTPUT_S3_BUCKET: Final output bucket that should contain count matrix
OUTPUT_BASE_DIR: Output folder path in EC2 instance to download individual output files from S3 bucket. Select this to be some path inside /mnt/nvme as we are mounting the instance store to /mnt/nvme

# start the pipeline
cd ~/refactored-serverless-pipeline
./run_scripts.sh

The final quant matrix will be present in the S3 bucket mentioned in SCRNA_OUTPUT_S3_BUCKET parameter of the run_scripts.sh file

# delete AWS Resources created
cd ~/refactored-serverless-pipeline
./clean_up_resources.sh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
compare_scripts		compare_scripts
install_scripts		install_scripts
scrna-pipeline		scrna-pipeline
License		License
README.md		README.md
alevin_process.sh		alevin_process.sh
clean_up_resources.sh		clean_up_resources.sh
combine_map_rad.sh		combine_map_rad.sh
combine_unmapped_bc_count_bin.sh		combine_unmapped_bc_count_bin.sh
delete_aws_resources.py		delete_aws_resources.py
delete_generated_files_s3.py		delete_generated_files_s3.py
process_fastq.py		process_fastq.py
refactored-serverless-pipeline.iml		refactored-serverless-pipeline.iml
run_scripts.sh		run_scripts.sh
set-up-resources.py		set-up-resources.py
split_and_upload.sh		split_and_upload.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Serverless scRNA Pipeline

Description

Usage

Configuration Parameters

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

BioDepot/scRNA-serverless

Folders and files

Latest commit

History

Repository files navigation

Serverless scRNA Pipeline

Description

Usage

Configuration Parameters

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages