Closed
Description
As mentioned in issue #250, there is a max limit of 25 dataflow jobs per project, which cannot be increased. Our integration test cases were developed under the assumption that we dont have such a low limit. The tests currently take about an hour to run (~40mins for vcf_to_bq and ~15mins for preprocessor), and we should aim to keep them under an hour. Some ideas:
- Prioritize tests: we should start large tests before smaller ones. This is currently not an issue as majority of tests finish in ~5mins, so large tests wont be starved for more than 5mins. However, this can get worse as we add more tests.
- Remove redundant test cases (e.g. we dont expect anything new caught by "valid_4_1_gz.json" that is not already caught by "valid_4_0_gz.json" and "valid_4_1.json").
- Consider using DirectRunner for some of the small tests
- Combine tests together (e.g. we added
--num_bigquery_write_shards
to an existing platinum test instead of adding a new one just to test that flag).