Skip to content

Revisit integration tests (reduce #, combine, use DirectRunner more) #255

Closed
@arostamianfar

Description

@arostamianfar

As mentioned in issue #250, there is a max limit of 25 dataflow jobs per project, which cannot be increased. Our integration test cases were developed under the assumption that we dont have such a low limit. The tests currently take about an hour to run (~40mins for vcf_to_bq and ~15mins for preprocessor), and we should aim to keep them under an hour. Some ideas:

  • Prioritize tests: we should start large tests before smaller ones. This is currently not an issue as majority of tests finish in ~5mins, so large tests wont be starved for more than 5mins. However, this can get worse as we add more tests.
  • Remove redundant test cases (e.g. we dont expect anything new caught by "valid_4_1_gz.json" that is not already caught by "valid_4_0_gz.json" and "valid_4_1.json").
  • Consider using DirectRunner for some of the small tests
  • Combine tests together (e.g. we added --num_bigquery_write_shards to an existing platinum test instead of adding a new one just to test that flag).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions