Implement genomics pipeline

## Overview
- TERRA Reference team coordinate implementation of a basic genome seq pipeline (e.g. Process Reads —> Map & Assemble Reads —> Call Variants —> Annotate Variants) described in terraref/reference-data#19 and summarized in fig. below
- Most of our 400 lines will be resequenced, but ~40 lines will be sequenced for de novo assembly, so the pipeline(s) will need to accommodate both of these pipeline paths.
- Mike Gore will lead the system architecture

![image](https://cloud.githubusercontent.com/assets/464871/11194261/b1f51906-8c70-11e5-997b-e603e2322901.png)
## Questions to address:
### Resources?
- How much computing resources are required?
- What are the data sizes?
  - sequencing coverage
  - number of samples, libraries, lanes
  - expected rate of data production over time
- do we have sample datasets so that we can set up the pipeline prior to receiving data? (perhaps re-create Maize pipeline)
- What software needs to be installed
  - Biocluster stack is listed in [biocluster_modules.txt](https://github.com/terraref/computing-pipeline/files/52343/biocluster_modules.txt)
### Division of Labor
- Who does what? Among [HPCBio](http://hpcbio.illinois.edu/), NCSA, Danforth, Cornell, other teams
- What will be done where? 
- How will the data move from one location to another? At what stage in their processing? 
- To what extent do the workflows need to be automated? 
- Pipeline has been implemented on several systems at NCSA; code is available on Github: https://github.com/HPCBio/BW_VariantCalling ([Documentation](https://raw.githubusercontent.com/HPCBio/BW_VariantCalling/master/UserDocumentation.txt))
  - Can TERRA use this workflow? What modifications will be necessary? Or would it be worthwhile to start from scratch?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement genomics pipeline #37

Overview

Questions to address:

Resources?

Division of Labor

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement genomics pipeline #37

Description

Overview

Questions to address:

Resources?

Division of Labor

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions