Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/sqlquery] Support for Executing Queries Across Multiple Database Instances #33119

Open
Talbalash-legit opened this issue May 19, 2024 · 8 comments
Labels
enhancement New feature or request receiver/sqlquery SQL query receiver Stale

Comments

@Talbalash-legit
Copy link

Component(s)

receiver/sqlquery

Is your feature request related to a problem? Please describe.

Currently, executing the same query across multiple database (e.g., several DBs on an AWS RDS cluster) requires duplicating the query for each instance. This leads to redundancy and increased maintenance overhead, as any change to the query must be replicated across all instances manually.

If i have 17 DBs in a cluster i will need to replicate the query configuration 17 times with the current solution.
How can it be achieved today? maybe i'm missing something

Describe the solution you'd like

I would like the SQLQuery Receiver to support executing a single query across multiple database instances. This could be achieved by allowing users to specify a list of database connections within a single configuration, so that the query is executed against each instance automatically. This feature would streamline the process and reduce the need for query duplication.

The current situation is:

    receivers:
      sqlquery/DB_1:
        driver: postgres
        datasource: "host=${HOST} port=5432 dbname=DB_1 user=${USER} password=${PASS}"
        collection_interval: 1h
        queries:
          - sql: "SELECT idstat.relname AS TABLE_NAME, indexrelname AS index_name, idstat.idx_scan AS index_scans_count, pg_relation_size(indexrelid) AS index_size_bytes FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname AND idstat.schemaname = pg_indexes.schemaname WHERE indexdef !~* 'unique' ORDER BY idstat.idx_scan DESC"
            metrics:
              - metric_name: postgres_index_scans_count
                value_column: "index_scans_count"
                attribute_columns: ["table_name", "index_name"]
                static_attributes:
                  db_name: DB_1
              - metric_name: postgres_index_size_bytes
                value_column: "index_size_bytes"
                attribute_columns: ["table_name", "index_name"]
                static_attributes:
                  db_name: DB_1

      sqlquery/DB_2:
        driver: postgres
        datasource: "host=${HOST} port=5432 dbname=DB_2 user=${USER} password=${PASS}"
        collection_interval: 1h
        queries:
          - sql: "SELECT idstat.relname AS TABLE_NAME, indexrelname AS index_name, idstat.idx_scan AS index_scans_count, pg_relation_size(indexrelid) AS index_size_bytes FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname AND idstat.schemaname = pg_indexes.schemaname WHERE indexdef !~* 'unique' ORDER BY idstat.idx_scan DESC"
            metrics:
              - metric_name: postgres_index_scans_count
                value_column: "index_scans_count"
                attribute_columns: ["table_name", "index_name"]
                static_attributes:
                  db_name: DB_2
              - metric_name: postgres_index_size_bytes
                value_column: "index_size_bytes"
                attribute_columns: ["table_name", "index_name"]
                static_attributes:
                  db_name: DB_2

I would propose the following:

receivers:
      sqlquery
        driver: postgres
        datasource: "host=${HOST} port=5432 dbnames=DB_1,DB_2,DB_3 user=${USER} password=${PASS}"
        collection_interval: 1h
        queries:
          - sql: "SELECT idstat.relname AS TABLE_NAME, indexrelname AS index_name, idstat.idx_scan AS index_scans_count, pg_relation_size(indexrelid) AS index_size_bytes FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname AND idstat.schemaname = pg_indexes.schemaname WHERE indexdef !~* 'unique' ORDER BY idstat.idx_scan DESC"
            metrics:
              - metric_name: postgres_index_scans_count
                value_column: "index_scans_count"
                attribute_columns: ["table_name", "index_name"]
                static_attributes:
                  db_name: $$DB_NAME
              - metric_name: postgres_index_size_bytes
                value_column: "index_size_bytes"
                attribute_columns: ["table_name", "index_name"]
                static_attributes:
                  db_name: $$DB_NAME

In this configuration, $$DB_NAME would be dynamically replaced with the actual database name for each datasource during query execution.

Describe alternatives you've considered

  1. Manually duplicating the queries for each database instance, which is not efficient and prone to errors.
  2. Using external scripts or tools to handle the execution of queries across multiple instances, which adds complexity and is outside the scope of the SQLQuery Receiver

Additional context

No response

@Talbalash-legit Talbalash-legit added enhancement New feature or request needs triage New item requiring triage labels May 19, 2024
@github-actions github-actions bot added the receiver/sqlquery SQL query receiver label May 19, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

The datasource string is passed directly to the sql.Open function call without parsing any information. This means if we want a datasource string to be handled in a specific way, the issue should really be filed against the SQL driver itself. I don't think this receiver should try to parse datasource strings in some meaningful way and then modify functionality depending on the result. I worry that would end up being too hard to maintain and would in itself be hard for users to understand.

I agree this is painful, but I'm not seeing a good way to handle this. A PR would be welcome though if you're interested in contributing, for us to be able to see the end complexity of this.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label May 20, 2024
@Talbalash-legit
Copy link
Author

The datasource string is passed directly to the sql.Open function call without parsing any information. This means if we want a datasource string to be handled in a specific way, the issue should really be filed against the SQL driver itself. I don't think this receiver should try to parse datasource strings in some meaningful way and then modify functionality depending on the result. I worry that would end up being too hard to maintain and would in itself be hard for users to understand.

I agree this is painful, but I'm not seeing a good way to handle this. A PR would be welcome though if you're interested in contributing, for us to be able to see the end complexity of this.

Thanks for the answer! maybe can you consider allowing an array of datasource strings?

@crobert-1
Copy link
Member

I thought a bit about this. It's not entirely off the table, it's just going to be a breaking change if we do that. I'm not aware if a more simple way to handle this though.

@benjie23
Copy link

benjie23 commented May 22, 2024

I thought a bit about this. It's not entirely off the table, it's just going to be a breaking change if we do that. I'm not aware if a more simple way to handle this though.

It would be great to at least set an array of databases per host with the same query.
Otherwise, I'll need to duplicate the entire logic and queries for each host and database combination, which can be a lot - specially in a production environment.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 22, 2024
@Talbalash-legit
Copy link
Author

Hey @dmitryax .
Have you had any further thoughts on how we might approach this, or have you come across any simpler alternatives since we last discussed it?

@github-actions github-actions bot removed the Stale label Aug 27, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request receiver/sqlquery SQL query receiver Stale
Projects
None yet
Development

No branches or pull requests

3 participants