DAG for running and storing results of long queries

## Problem


While Grafana (and other tools like it) provide good interfaces for visualising data, they are not good for long queries. Those tools rely on reasonable time outs configured for their data sources to prevent inexpert (or unaware) users of the tools from taking down a database or Elasticsearch cluster (for example) with a big query.

## Description


Instead, let's create a DAG for running these long queries. The DAG should start by being able to run queries against the catalog database, and it should be restricted to non-destructive queries. To that end, we can use a separate user to execute the query against the database (a user with readonly permissions).

The DAG should accept a string for the query and a note explaining the purpose of the query, and under what circumstances it can or cannot be terminated.

Basically, this would just be the "select_query" part of the batched update DAG.

## Alternatives


Maybe there are better ways for long running queries? I see some mention of using materialized views for them, but I can't find anything super clear on the matter.

@AetherUnbound @stacimc y'all's advice here would be helpful. The main thing I want to solve is not needing to open direct connections to the databases for long queries, with the understanding that some long queries are necessary and unavoidable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAG for running and storing results of long queries #4595

sarayourfriend
openedon Jul 3, 2024

Problem

Description

Alternatives

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DAG for running and storing results of long queries #4595

Description

sarayourfriendopenedon Jul 3, 2024

Problem

Description

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

sarayourfriend
openedon Jul 3, 2024