Skip to content

Conversation

@bpblanken
Copy link
Collaborator

No description provided.

@bpblanken bpblanken changed the title clinvar feat: move clinvar to new pattern Oct 27, 2025
@bpblanken bpblanken marked this pull request as ready for review October 28, 2025 19:39
@bpblanken bpblanken requested a review from hanars October 28, 2025 19:39
if existing_version_obj and ClinvarAllVariantsSnvIndel.objects.filter(version=existing_version_obj.version).exists():
clinvar_run_sql(
Template(f"ALTER TABLE `$reference_genome/$dataset_type/clinvar_all_variants` DROP PARTITION '{new_version}';")
Template(f"ALTER TABLE `$reference_genome/$dataset_type/reference_data/clinvar/all_variants` DROP PARTITION '{new_version}';")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can actually get the table name off of the model's meta field which would make this code more resilient - you just use ClinvarAllVariantsSnvIndel._meta.db_table. It would also get rid of the need to template out variables

Copy link
Collaborator

@hanars hanars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I just want to confirm that after the migration runs the new materialized views will automatically populate with the correct clinvar data without needing to run the reload clinvar command or a 'SYSTEM REFRESH VIEW

('GRCh38', 'SNV_INDEL'),
('GRCh38', 'MITO'),
]:
cursor.execute(sql.substitute(reference_genome=reference_genome, dataset_type=dataset_type))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not particularly related to this PR, but this file should really be in the clickhouse_search/management/commands folder not the seqr commands folder

@bpblanken
Copy link
Collaborator Author

clinvar/all_variants and the clinvar Join table are just renames, so will have all of the data, leaving seqr search untouched. The seqr_variants table will be empty though, until either the job runs or a manual intervention.

I think this is fine, but there is a potential edge case where the two materialized views we create here run concurrently upon creation; the second one (the seqr_variants -> clinvar Join) will have an empty source table until the first completes, resulting in an empty Join table.

One hacky resolution would be to add a RunPython that runs 'SYSTEM REFRESH VIEW in the correct order at the end of this migration.

@hanars
Copy link
Collaborator

hanars commented Nov 10, 2025

Having an empty clinvar join table at the end of this is not really an acceptable situation so we will need this to deterministically result with all the views populated. I feel like the best way to do this would be adding RunSql after all the CLINVAR_ALL_TO_SEQR_MV commands that waits until those are populated before running the CLINVAR_SEQR_TO_SEARCH_MV commands

@bpblanken bpblanken requested a review from hanars November 10, 2025 17:45
FROM `$reference_genome/$dataset_type/reference_data/clinvar/seqr_variants`
""")

def build_materialized_view(reference_genome: str, dataset_type: str, materialized_view: str):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a mechanism for refreshing and waiting for each view.

@bpblanken bpblanken merged commit ee311c0 into dev Nov 10, 2025
8 checks passed
bpblanken added a commit that referenced this pull request Nov 10, 2025
@bpblanken bpblanken deleted the benb/clinvar branch November 10, 2025 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants