-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
For lack of a better place for this, our collaboration with Be The Match will require
- Download BAM files from s3, transform to ADAM Avro+Parquet, and upload to s3 (transform_alignments)
- Download ADAM Avro+Parquet alignments for multiple samples from s3, update record groups to prevent collision, merge into a single multi-sample ADAM Avro+Parquet alignments data set, and upload to s3 (merge_alignments)
- Report BAM file sizes, single sample ADAM Avro+Parquet alignments file sizes, and merged ADAM Avro+Parquet alignments file size
- Download VCF files from s3, transform to ADAM Avro+Parquet variants and genotypes, and upload to s3 (transform_variants, transform_genotypes)
- Download ADAM Avro+Parquet variants for multiple samples, merge into a single sites-only ADAM Avro-Parquet variants data set, and upload to s3 (merge_variants)
- Download ADAM Avro+Parquet genotypes for multiple samples, merge into a single multi-sample ADAM Avro-Parquet genotypes data set, and upload to s3 (merge_genotypes)
- Report VCF file sizes, single sample ADAM Avro+Parquet variants and genotypes file sizes, and merged ADAM Avro+Parquet variants and genotypes file sizes
- Notebook with queries to compare native file via s3 vs. transformed via s3 access performance
- Documentation on how to run this stuff
- Short manuscript on transformation process, storage requirements, and access performance
There hasn't been an ask for realigning reads, recalling variants, annotating variants with SnpEff, or joint genotyping yet, but there could be in the near future.
Metadata
Metadata
Assignees
Labels
No labels