Implement a resource estimator tool

It is currently not trivial to estimate the resources needed to run the pipeline (particularly amount of disk space). This is especially important when processing large (10TB+) datasets as cost becomes an important factor.

The idea is to write a tool that preprocesses the data (similar to the header merger pipeline) and outputs the minimum resource requirements for running the pipeline.

Note that estimating the disk size is not as trivial as summing up the size of all files being processed as there is an overhead for each record depending on the number of samples, info, and filter fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a resource estimator tool #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement a resource estimator tool #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions