Add helper functions for querying `Terra Data Repository` #998

bpblanken · 2024-12-11T16:06:53Z

Adds helper functions for querying tdr and the underlying big query stores. Does not manage the persistence of these into the pipeline!

jklugherz · 2024-12-12T16:56:45Z

v03_pipeline/lib/misc/allele_registry.py

-            status_forcelist=[500, 502, 503, 504],
-        )
-        s.mount('https://', HTTPAdapter(max_retries=retries))
+        s = requests_retry_session()


jklugherz · 2024-12-12T17:23:29Z

v03_pipeline/lib/misc/terra_data_repository.py

+
+
+def gen_bq_table_names() -> Generator[str]:
+    with ThreadPoolExecutor(max_workers=5) as executor:


How did you chose 5 max_workers?

I believe ai was responsible for this choice 🍭.

I'll make it a constant on the downstream pr.

jklugherz · 2024-12-12T17:31:10Z

v03_pipeline/lib/misc/terra_data_repository.py

+    )
+
+
+def _get_dataset_ids() -> list[str]:


Are we calling this on every pipeline run? It fetches all of the datasets in the TDR, is there a way to filter it first by dataset name perhaps?

Yes! This should be called every pipeline run, but it's only a single API request. The plan was to use the result of this request + a persisted list of dataset ids that we've seen before to create a "new dataset ids" list that would be passed into gen_bq_sample_metrics.

…s into benb/fetch_predicted_sex_from_tdr

* Add service account credentialing (#997) * Add service account credentialing * ruff * feat: Handle parsing empty predicted sex into Unknown (#1000) * Add helper functions for querying `Terra Data Repository` (#998) * Add service account credentialing * ruff * First pass * tests passing * add coverage of bigquery test * change function names * use generators everywhere * bq requirement * resolver * Update sample id name * Build Sex Check Table from TDR Metrics (#999)

* Add service account credentialing (#997) * Add service account credentialing * ruff * feat: Handle parsing empty predicted sex into Unknown (#1000) * Add helper functions for querying `Terra Data Repository` (#998) * Add service account credentialing * ruff * First pass * tests passing * add coverage of bigquery test * change function names * use generators everywhere * bq requirement * resolver * Update sample id name * Build Sex Check Table from TDR Metrics (#999) * refactor: Move feature flags to FeatureFlag enum. (#1002) * refactor: Move feature flags out of environment to their own dataclass * lint: ruff * ruff * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples (#1003) * bugfix: exclude samples from relationship checking that are not present in the expected loadable samples * cleanup * feat: add remap and family loading failures as validation exceptions … (#1005) * feat: add remap and family loading failures as validation exceptions rather than runtime errors * move on * Update write_remapped_and_subsetted_callset_test.py * ruff * feat: Add ability to run tasks dataproc. (#948) * Support gcs dirs in rsync * ws * Add create dataproc cluster task * add dataproc * ruff * requirements * still struggling * Gencode refactor to remove gcs * bump reqs * Run dataproc job * lib * running * merge requirements * Flip'em * Better exception handling * Cleaner approach if less generalizable * write a test * Fix tests * lint * Add test for success * refactor to use a base class... better for adding support for multiple jobs * cleanup * ruff * Fix missing mock * Fix flapping test * pr comments

bpblanken added 8 commits December 10, 2024 12:04

Add service account credentialing

66ac408

ruff

6ad41aa

First pass

e304369

tests passing

e929280

add coverage of bigquery test

27174aa

change function names

4978d74

use generators everywhere

e7d96bd

bq requirement

f992a51

bpblanken changed the title ~~Benb/fetch predicted sex from tdr~~ Add helper function for querying Terra Data Repository Dec 11, 2024

resolver

a767805

bpblanken changed the title ~~Add helper function for querying Terra Data Repository~~ Add helper functions for querying Terra Data Repository Dec 11, 2024

bpblanken marked this pull request as ready for review December 11, 2024 19:01

bpblanken requested a review from a team as a code owner December 11, 2024 19:01

jklugherz reviewed Dec 12, 2024

View reviewed changes

Update sample id name

1a272d6

Base automatically changed from benb/add_service_account_credentialing to dev December 13, 2024 14:41

bpblanken and others added 2 commits December 13, 2024 09:49

Merge branch 'dev' of github.com:broadinstitute/seqr-loading-pipeline…

5c6edcb

…s into benb/fetch_predicted_sex_from_tdr

Build Sex Check Table from TDR Metrics (#999)

9e25973

bpblanken merged commit 8b58c01 into dev Dec 13, 2024
2 checks passed

bpblanken deleted the benb/fetch_predicted_sex_from_tdr branch December 13, 2024 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add helper functions for querying `Terra Data Repository` #998

Add helper functions for querying `Terra Data Repository` #998

Uh oh!

bpblanken commented Dec 11, 2024 •

edited

Loading

Uh oh!

jklugherz Dec 12, 2024

Uh oh!

jklugherz Dec 12, 2024

Uh oh!

bpblanken Dec 13, 2024

Uh oh!

bpblanken Dec 13, 2024

Uh oh!

jklugherz Dec 12, 2024

Uh oh!

bpblanken Dec 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def gen_bq_table_names() -> Generator[str]:
		with ThreadPoolExecutor(max_workers=5) as executor:

Add helper functions for querying Terra Data Repository #998

Add helper functions for querying Terra Data Repository #998

Uh oh!

Conversation

bpblanken commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklugherz Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

jklugherz Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

bpblanken Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

bpblanken Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

jklugherz Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

bpblanken Dec 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add helper functions for querying `Terra Data Repository` #998

Add helper functions for querying `Terra Data Repository` #998

bpblanken commented Dec 11, 2024 •

edited

Loading