Description
Problem
While Grafana (and other tools like it) provide good interfaces for visualising data, they are not good for long queries. Those tools rely on reasonable time outs configured for their data sources to prevent inexpert (or unaware) users of the tools from taking down a database or Elasticsearch cluster (for example) with a big query.
Description
Instead, let's create a DAG for running these long queries. The DAG should start by being able to run queries against the catalog database, and it should be restricted to non-destructive queries. To that end, we can use a separate user to execute the query against the database (a user with readonly permissions).
The DAG should accept a string for the query and a note explaining the purpose of the query, and under what circumstances it can or cannot be terminated.
Basically, this would just be the "select_query" part of the batched update DAG.
Alternatives
Maybe there are better ways for long running queries? I see some mention of using materialized views for them, but I can't find anything super clear on the matter.
@AetherUnbound @stacimc y'all's advice here would be helpful. The main thing I want to solve is not needing to open direct connections to the databases for long queries, with the understanding that some long queries are necessary and unavoidable.
Metadata
Assignees
Labels
Type
Projects
Status
📋 Backlog