Skip to content

Implement a resource estimator tool #67

Open
@arostamianfar

Description

@arostamianfar

It is currently not trivial to estimate the resources needed to run the pipeline (particularly amount of disk space). This is especially important when processing large (10TB+) datasets as cost becomes an important factor.

The idea is to write a tool that preprocesses the data (similar to the header merger pipeline) and outputs the minimum resource requirements for running the pipeline.

Note that estimating the disk size is not as trivial as summing up the size of all files being processed as there is an overhead for each record depending on the number of samples, info, and filter fields.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions