feat: add build_table_metrics_query method for enhanced column metrics #340

AnshumanX304 · 2025-10-14T07:46:33Z

User description

…s retrieval

Fixes/Implements

Description

Summary Goes here.

Type of change

Delete irrelevant options.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Locally Tested
Needs Testing From Production

PR Type

Enhancement

Description

Add build_table_metrics_query method to compute column-level statistics
Support numeric metrics (min, max, average) and text metrics (max character length)
Include distinct count, duplicate count, and null count for all columns
Implement value normalization for JSON serialization (Decimal, datetime, UUID)

Diagram Walkthrough

flowchart LR
  A["Column Info Input"] --> B["Build Metrics Query"]
  B --> C["Execute SQL Query"]
  C --> D["Normalize Values"]
  D --> E["Return Column Metrics"]

File Walkthrough

Relevant files

Enhancement

postgres.py `Add column metrics query builder with type normalization` dcs_core/integrations/databases/postgres.py Add `build_table_metrics_query` method to generate and execute column statistics queries Implement metrics calculation for distinct, duplicate, and null counts across all columns Add type-specific metrics: min/max/average for numeric types, max character length for text types Create `_normalize_metrics` helper function to convert database types (Decimal, datetime, UUID) to JSON-serializable formats	+88/-1

Summary by CodeRabbit

New Features
- Column-level profiling for PostgreSQL tables with per-column metrics.
- Metrics: distinct count, duplicate count, null count; min/max/average for numeric columns; max character length for text columns.
- Option to include additional custom metric queries.
- Results returned as per-column structures with JSON-friendly normalization for various value types.

…s retrieval

coderabbitai · 2025-10-14T07:47:00Z

Walkthrough

Adds a new method to PostgresDataSource that builds and executes a single aggregated SQL query to compute per-column metrics for a table, then normalizes the single-row result into JSON-serializable Python values and returns a list of per-column dictionaries. (50 words)

Changes

Cohort / File(s)	Summary of Changes
Postgres column metrics query `dcs_core/integrations/databases/postgres.py`	Added `build_table_metrics_query(self, table_name: str, column_info: list[dict], additional_queries: Optional[list[str]] = None) -> list[dict]`. Constructs per-column SQL fragments (distinct, duplicate, is_null; numeric: min/max/avg; text: max_character_length), optionally appends extra queries, executes a single aggregated SELECT, maps the single-row result into per-column dicts, and normalizes `Decimal`, `datetime`/`date`, `UUID`, nested lists/dicts, and `None` into JSON-serializable Python types. Added imports for `datetime`, `Decimal`, and `UUID`. Handles empty `column_info` by returning `[]`.

Sequence Diagram(s)

sequenceDiagram
    actor Caller
    participant DS as PostgresDataSource
    participant PG as PostgreSQL

    Caller->>DS: build_table_metrics_query(table_name, column_info, additional_queries?)
    DS->>DS: Assemble per-column SQL fragments\n(distinct, duplicate, is_null,\n+ type-specific expressions)
    DS->>PG: Execute single aggregated SELECT
    PG-->>DS: Single-row aggregated result
    DS->>DS: Split row into per-column dicts\nNormalize Decimal/date/datetime/UUID/nested
    DS-->>Caller: return List[dict] of column metrics

    rect rgb(235,245,255)
    note right of DS: New: dynamic SQL assembly\nand JSON-safe normalization
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add fetch_sample_values_from_database method #332: Modifies the same PostgresDataSource class in dcs_core/integrations/databases/postgres.py and likely overlaps with added method changes.

Suggested reviewers

rishav2803

Poem

I hop through rows where metrics bloom,
Counting nulls and lengths in every room.
Distinct and duplicate, I softly chase,
Normalize Decimals to a JSON-friendly face.
A rabbit cheers for tidy queries! 🐇📊

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description retains the template structure but leaves key sections as placeholders and unchecked checklist items, omitting the issue reference, a meaningful summary, selected change type, and testing details.	Please fill in the “Fixes/Implements” section with the relevant issue number, replace the placeholder summary with an actual description of the changes, select the appropriate type of change and testing checkboxes, and remove any unused template lines.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly identifies the addition of the build_table_metrics_query method and concisely describes its purpose in enhancing column metrics without extraneous detail or ambiguity.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2025-10-14T07:47:00Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	SQL injection Description: Raw SQL query is built using unescaped table and column values from `column_info` and `table_name`, enabling SQL injection if these inputs are not strictly validated or quoted. postgres.py [351-354] Referred Code query = f"SELECT\n {',\n '.join(query_parts)}\nFROM {table_name};" result = self.connection.execute(text(query)) row = dict(list(result)[0]._mapping)
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
⚪	No custom compliance provided Follow the guide to enable custom compliance check.
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-10-14T07:48:14Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Re-architect metric generation for scalability The current method of building a single, large SQL query for all column metrics does not scale well. It should be refactored to use more targeted queries, such as one per column or per metric type, to avoid database performance issues. Examples: dcs_core/integrations/databases/postgres.py [315-351] def build_table_metrics_query( self, table_name: str, column_info: list[dict] ) -> list[dict]: query_parts = [] for col in column_info: name = col["column_name"] dtype = col["data_type"].lower() query_parts.append(f'COUNT(DISTINCT "{name}") AS "{name}_distinct"') ... (clipped 27 lines) Solution Walkthrough: Before: def build_table_metrics_query(table_name, column_info): query_parts = [] for col in column_info: name = col["column_name"] # Add multiple metrics for each column to a single list query_parts.append(f'COUNT(DISTINCT "{name}") AS "{name}_distinct"') query_parts.append(f'MIN("{name}") AS "{name}_min"') # ... and so on for all metrics # Combine all parts into one very large query query = f"SELECT {',\n '.join(query_parts)} FROM {table_name};" # Execute the single, complex query result = self.connection.execute(text(query)) row = result.first() # ... process the single wide row After: def build_table_metrics_query(table_name, column_info): all_column_metrics = [] for col in column_info: name = col["column_name"] # Build a targeted query for each column metric_parts = [] metric_parts.append(f'COUNT(DISTINCT "{name}") AS "distinct"') metric_parts.append(f'MIN("{name}") AS "min"') # ... and so on for all applicable metrics query = f"SELECT {', '.join(metric_parts)} FROM {table_name};" # Execute a simpler query per column result = self.connection.execute(text(query)) metrics = result.first()._mapping all_column_metrics.append({"column_name": name, "metrics": metrics}) return all_column_metrics Suggestion importance[1-10]: 9 __ Why: This suggestion correctly identifies a critical architectural flaw where a single, wide SQL query will not scale for tables with many columns, leading to significant performance degradation or query failures.	High
Security	✅ ~~Prevent SQL injection using helpers~~ Suggestion Impact: The commit updated the query construction to use a qualified table name variable set via self.qualified_table_name(table_name) in the FROM clause, addressing the suggested SQL injection concern. code diff: + qualified_table = self.qualified_table_name(table_name) + query = f'SELECT\n {",\n ".join(query_parts)}\nFROM {qualified_table};' Prevent a potential SQL injection vulnerability by using the `self.qualified_table_name()` method to properly quote and sanitize the `table_name` in the SQL query string. dcs_core/integrations/databases/postgres.py [351] -query = f"SELECT\n {',\n '.join(query_parts)}\nFROM {table_name};" +query = f"SELECT\n {',\n '.join(query_parts)}\nFROM {self.qualified_table_name(table_name)};" `[Suggestion processed]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a potential SQL injection vulnerability and proposes using an existing helper method (`qualified_table_name`) to sanitize the table name, which is a critical security fix.	High
Possible issue	✅ ~~Raise error for unhandled data types~~ Suggestion Impact: The commit updated the datetime isinstance check to use datetime.datetime instead of datetime, aligning with part of the suggestion. However, it did not implement raising TypeError for unhandled types (the diff snippet ends before showing the fallback change). Thus, the impact is partial. code diff: if isinstance(value, (int, float, bool)): return value - if isinstance(value, (datetime, datetime.date)): + if isinstance(value, (datetime.datetime, datetime.date)): return value.isoformat() In the `_normalize_metrics` function, replace the final `return str(value)` with `raise TypeError` to prevent silently converting unhandled data types to strings and instead fail explicitly. dcs_core/integrations/databases/postgres.py [374-385] -if isinstance(value, (datetime, datetime.date)): +if isinstance(value, (datetime.datetime, datetime.date)): return value.isoformat() if isinstance(value, UUID): return str(value) if isinstance(value, list): return [_normalize_metrics(v) for v in value] if isinstance(value, dict): return {k: _normalize_metrics(v) for k, v in value.items()} -return str(value) +# Fallback for any other types that are not explicitly handled. +# It's safer to raise an error than to silently convert to string. +raise TypeError(f"Unserializable type: {type(value)}") `[Suggestion processed]` Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that silently converting unhandled types to strings can hide bugs and proposes a fail-fast approach by raising a `TypeError`, which improves the code's robustness.	Medium
General	Improve performance by fetching single result Replace `list(result)[0]` with `result.first()` to more efficiently fetch the single result row from the aggregation query. dcs_core/integrations/databases/postgres.py [353-354] result = self.connection.execute(text(query)) -row = dict(list(result)[0]._mapping) +row = dict(result.first()._mapping) Apply / Chat Suggestion importance[1-10]: 5 __ Why: The suggestion correctly points out that using `result.first()` is more idiomatic and efficient than `list(result)[0]` for fetching a single row, improving code quality and adherence to best practices.	Low
Update

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (5)

dcs_core/integrations/databases/postgres.py (5)
353-355: Avoid materializing result; fetch one mapping row.

Prefer result.mappings().one() (or .first() if empty possible). Also aligns with Ruff hint.
-        result = self.connection.execute(text(query))
-        row = dict(list(result)[0]._mapping)
+        result = self.connection.execute(text(query))
+        row = result.mappings().one()
Based on static analysis hints

332-351: Postgres type checks are incomplete/misaligned.

Numeric: include real, double precision. int, float, double aren’t information_schema.data_type values.

Text: include character, drop string.
-            if dtype in (
-                "int",
-                "integer",
-                "bigint",
-                "smallint",
-                "decimal",
-                "numeric",
-                "float",
-                "double",
-            ):
+            if dtype in (
+                "integer",
+                "bigint",
+                "smallint",
+                "decimal",
+                "numeric",
+                "real",
+                "double precision",
+            ):
                 query_parts.append(f'MIN("{name}") AS "{name}_min"')
                 query_parts.append(f'MAX("{name}") AS "{name}_max"')
                 query_parts.append(f'AVG("{name}") AS "{name}_average"')

-            elif dtype in ("varchar", "text", "char", "string", "character varying"):
+            elif dtype in ("text", "character", "character varying", "varchar", "char"):
                 query_parts.append(
                     f'MAX(CHAR_LENGTH("{name}")) AS "{name}_max_character_length"'
                 )
320-351: Use quote_column helper consistently.

You’re manually quoting identifiers; prefer self.quote_column(name) to keep quoting logic centralized.

Example:
-            query_parts.append(f'COUNT(DISTINCT "{name}") AS "{name}_distinct"')
+            qcol = self.quote_column(name)
+            query_parts.append(f'COUNT(DISTINCT {qcol}) AS "{name}_distinct"')
Apply similarly for all occurrences.

315-318: Method name vs behavior mismatch.

It both builds and executes the query. Consider renaming to fetch_table_metrics or update docstring to reflect execution.

315-351: Operational note: heavy aggregates on wide/large tables.

Multiple COUNT(DISTINCT ...) + MIN/MAX/AVG across many columns can be expensive. Consider:

Optional sampling ratio (e.g., TABLESAMPLE) for exploratory metrics.

Per-column feature flag to limit aggregates.

Running in off-peak or with statement timeout.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f70e2db and e0ed19b.

📒 Files selected for processing (1)

dcs_core/integrations/databases/postgres.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.0)

dcs_core/integrations/databases/postgres.py

354-354: Prefer next(iter(result)) over single element slice

Replace with next(iter(result))

(RUF015)

dcs_core/integrations/databases/postgres.py

coderabbitai · 2025-10-14T07:52:31Z

dcs_core/integrations/databases/postgres.py

+            if isinstance(value, (datetime, datetime.date)):
+                return value.isoformat()
+
+            if isinstance(value, UUID):
+                return str(value)


⚠️ Potential issue | 🔴 Critical

Fix isinstance checks for datetime and UUID (runtime error + wrong type).

datetime in isinstance(..., (datetime, datetime.date)) is a module, causing TypeError.

UUID from SQLAlchemy isn’t the runtime type returned by drivers; use uuid.UUID.

Apply:

- if isinstance(value, (datetime, datetime.date)): + if isinstance(value, (datetime.datetime, datetime.date)): return value.isoformat() - if isinstance(value, UUID): + if isinstance(value, PyUUID): return str(value)

And adjust imports per separate comment.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In dcs_core/integrations/databases/postgres.py around lines 374 to 378, the isinstance checks are wrong: `datetime` is a module (causing TypeError) and `UUID` is the SQLAlchemy type (not the runtime uuid type). Replace the tuple check with (datetime.datetime, datetime.date) and use uuid.UUID for the UUID check; update imports to ensure datetime (module) and uuid (module) are available or import datetime.datetime and uuid.UUID as needed per the other import comment.

… and improve column quoting

…ional_queries in build_table_metrics_query

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

dcs_core/integrations/databases/postgres.py (4)
19-19: Fix the UUID import: use stdlib uuid.UUID, not SQLAlchemy's UUID type.

This is the same critical issue flagged in a previous review. The SQLAlchemy UUID is a type definition for columns, not the runtime value class returned by database drivers. Runtime UUID values from psycopg are uuid.UUID from the standard library.

Apply this diff:
-from sqlalchemy import UUID, create_engine, text
+from sqlalchemy import create_engine, text
+from uuid import UUID as PyUUID
Then update line 382 to use PyUUID instead of UUID.

318-321: Guard against empty column_info to prevent invalid SQL generation.

This critical issue was flagged in a previous review and remains unresolved. If column_info is empty, the generated SQL becomes SELECT FROM ..., which is syntactically invalid and will cause a runtime error.

Apply this diff:
     ) -> list[dict]:
         query_parts = []
+        if not column_info:
+            return []
 
         for col in column_info:
325-327: Fix duplicate count formula: NULLs should not be counted as duplicates.

This critical issue was flagged in a previous review and remains unresolved. The formula COUNT(*) - COUNT(DISTINCT "{name}") incorrectly treats NULLs as duplicates because COUNT(*) includes NULL rows while COUNT(DISTINCT "{name}") excludes them.

The correct formula excludes NULLs from both counts:

Apply this diff:
-            query_parts.append(
-                f'COUNT(*) - COUNT(DISTINCT {self.quote_column(name)}) AS "{name}_duplicate"'
-            )
+            query_parts.append(
+                f'COUNT({self.quote_column(name)}) - COUNT(DISTINCT {self.quote_column(name)}) AS "{name}_duplicate"'
+            )
379-383: Fix isinstance checks: datetime is a module, and UUID must be from stdlib.

This critical issue was flagged in a previous review and remains unresolved. Two problems:

Line 379: datetime is the module, not the class. Using it in isinstance will raise TypeError: isinstance() arg 2 must be a type or tuple of types.

Line 382: UUID from SQLAlchemy is a column type class, not the runtime value type. Database drivers return uuid.UUID from the standard library.

Apply this diff (after fixing the import per the first comment):
-            if isinstance(value, (datetime, datetime.date)):
+            if isinstance(value, (datetime.datetime, datetime.date)):
                 return value.isoformat()
 
-            if isinstance(value, UUID):
+            if isinstance(value, PyUUID):
                 return str(value)

🧹 Nitpick comments (1)

dcs_core/integrations/databases/postgres.py (1)
359-359: Prefer next(iter(result)) for efficiency.

Creating a full list from the result just to access the first element is inefficient. Use next(iter(result)) instead.

Apply this diff:
         result = self.connection.execute(text(query))
-        row = dict(list(result)[0]._mapping)
+        row = dict(next(iter(result))._mapping)
Based on static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0ed19b and 989ee55.

📒 Files selected for processing (1)

dcs_core/integrations/databases/postgres.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dcs_core/integrations/databases/postgres.py (5)

dcs_core/integrations/databases/mysql.py (2)

quote_column (86-92)

qualified_table_name (76-84)

dcs_core/integrations/databases/oracle.py (2)

quote_column (80-86)

qualified_table_name (70-78)

dcs_core/integrations/databases/mssql.py (2)

quote_column (120-126)

qualified_table_name (110-118)

dcs_core/integrations/databases/sybase.py (2)

quote_column (210-216)

qualified_table_name (200-208)

dcs_core/core/datasource/sql_datasource.py (2)

quote_column (148-154)

qualified_table_name (130-138)

🪛 GitHub Actions: CI

dcs_core/integrations/databases/postgres.py

[error] 1-1: Trailing whitespace detected and removed by pre-commit hook: fix in file.

[error] 356-356: Black formatting failed: cannot parse source due to f-string backslash issue. Possible Python version compatibility problem.

🪛 Ruff (0.14.0)

dcs_core/integrations/databases/postgres.py

359-359: Prefer next(iter(result)) over single element slice

Replace with next(iter(result))

(RUF015)

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

dcs_core/integrations/databases/postgres.py (3)
316-322: Guard against empty column_info to prevent invalid SQL.

If column_info is empty, the generated SQL becomes SELECT FROM ..., which is invalid and will cause a runtime error.

Apply this diff to add an early guard:
     ) -> list[dict]:
         query_parts = []
+        if not column_info:
+            return []

         for col in column_info:
328-336: Duplicate count formula treats NULLs as duplicates.

Line 332 uses COUNT(*) - COUNT(DISTINCT ...) which incorrectly treats NULL values as duplicates. The correct formula excludes NULLs.

Apply this diff to fix the duplicate count calculation:
             query_parts.append(
-                f'COUNT(*) - COUNT(DISTINCT {self.quote_column(name)}) AS "{name}_duplicate"'
+                f'COUNT({self.quote_column(name)}) - COUNT(DISTINCT {self.quote_column(name)}) AS "{name}_duplicate"'
             )
387-388: Fix isinstance check: datetime is a module, not a type.

Line 387 uses isinstance(value, (datetime, datetime.date)) but datetime is imported as a module (line 15), not a class. This will raise TypeError: isinstance() arg 2 must be a type or tuple of types at runtime.

Apply this diff:
-            if isinstance(value, (datetime, datetime.date)):
+            if isinstance(value, (datetime.datetime, datetime.date)):
                 return value.isoformat()

🧹 Nitpick comments (2)

dcs_core/integrations/databases/postgres.py (2)
359-361: Rename loop variable and document SQL injection risk.

The loop variable queries (plural) iterates over individual query strings, which is confusing. Additionally, additional_queries strings are concatenated directly without validation, creating a potential SQL injection risk if input is not sanitized by the caller.

Apply this diff to improve clarity:
         if additional_queries:
-            for queries in additional_queries:
-                query_parts.append(queries)
+            for query in additional_queries:
+                query_parts.append(query)
Consider adding a docstring warning that additional_queries must be pre-validated to prevent SQL injection.

363-367: Use next(iter(...)) instead of list slicing for single-row result.

Line 367 materializes the entire result set into a list before taking the first element, which is inefficient. Since the aggregate query returns exactly one row, use next(iter(result)) instead.

As per static analysis hints.

Apply this diff:
         result = self.connection.execute(text(query))
-        row = dict(list(result)[0]._mapping)
+        row = dict(next(iter(result))._mapping)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 989ee55 and ff9a6d1.

📒 Files selected for processing (1)

dcs_core/integrations/databases/postgres.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dcs_core/integrations/databases/postgres.py (5)

dcs_core/integrations/databases/mysql.py (2)

quote_column (86-92)

qualified_table_name (76-84)

dcs_core/integrations/databases/oracle.py (2)

quote_column (80-86)

qualified_table_name (70-78)

dcs_core/integrations/databases/mssql.py (2)

quote_column (120-126)

qualified_table_name (110-118)

dcs_core/integrations/databases/sybase.py (2)

quote_column (210-216)

qualified_table_name (200-208)

dcs_core/core/datasource/sql_datasource.py (2)

quote_column (148-154)

qualified_table_name (130-138)

🪛 Ruff (0.14.0)

dcs_core/integrations/databases/postgres.py

367-367: Prefer next(iter(result)) over single element slice

Replace with next(iter(result))

(RUF015)

🔇 Additional comments (5)

dcs_core/integrations/databases/postgres.py (5)

15-18: LGTM!

The imports are correctly added to support the new functionality.

338-352: LGTM!

The numeric type detection and metric calculations (min, max, average) are correctly implemented.

354-357: LGTM!

The text type detection and max character length calculation are correctly implemented.

369-398: Well-structured normalization logic with good defensive programming.

The _normalize_metrics helper correctly handles various data types and includes proper None checks, recursive handling of nested structures, and safe fallback to string conversion. The only issue is the datetime isinstance check on line 387 (flagged separately).

400-411: LGTM!

The result transformation logic correctly maps the flat row dictionary to a structured per-column format, applying normalization to each metric value.

…t datetime import

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

dcs_core/integrations/databases/postgres.py (1)

333-335: Duplicate count is wrong when NULLs exist.

COUNT(*) - COUNT(DISTINCT ...) treats every NULL as a duplicate. Use COUNT(column) so NULLs are excluded.

Apply:

-            query_parts.append(
-                f'COUNT(*) - COUNT(DISTINCT {self.quote_column(name)}) AS "{name}_duplicate"'
-            )
+            query_parts.append(
+                f'COUNT({self.quote_column(name)}) - COUNT(DISTINCT {self.quote_column(name)}) AS "{name}_duplicate"'
+            )

🧹 Nitpick comments (1)

dcs_core/integrations/databases/postgres.py (1)
368-369: Fetch the single row without materializing the whole result.

list(result)[0] pulls the entire cursor into memory. Use result.fetchone() (or result.mappings().first()) to grab the only row more efficiently.

Consider:
-        result = self.connection.execute(text(query))
-        row = dict(list(result)[0]._mapping)
+        result = self.connection.execute(text(query))
+        fetched = result.fetchone()
+        row = dict(fetched._mapping) if fetched else {}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff9a6d1 and f6c0119.

📒 Files selected for processing (1)

dcs_core/integrations/databases/postgres.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

dcs_core/integrations/databases/postgres.py (2)

dcs_core/integrations/databases/mysql.py (2)

quote_column (86-92)

qualified_table_name (76-84)

dcs_core/core/datasource/sql_datasource.py (2)

quote_column (148-154)

qualified_table_name (130-138)

🪛 Ruff (0.14.0)

dcs_core/integrations/databases/postgres.py

369-369: Prefer next(iter(result)) over single element slice

Replace with next(iter(result))

(RUF015)

feat: add build_table_metrics_query method for enhanced column metric…

e0ed19b

…s retrieval

qodo-code-review bot added the Review effort 2/5 label Oct 14, 2025

rishav2803 approved these changes Oct 14, 2025

View reviewed changes

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

feat: enhance build_table_metrics_query to support additional queries…

989ee55

… and improve column quoting

Ryuk-me approved these changes Oct 14, 2025

View reviewed changes

fix: correct import statement for UUID and update type hint for addit…

1eab279

…ional_queries in build_table_metrics_query

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

fix: file format

ff9a6d1

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

fix: handle empty column_info in build_table_metrics_query and correc…

f6c0119

…t datetime import

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

AnshumanX304 merged commit 851fa77 into datachecks:main Oct 14, 2025
5 of 7 checks passed

coderabbitai bot mentioned this pull request Oct 21, 2025

feat: add build_table_metrics_query method and improve fetch_sample_value #343

Merged

6 tasks

coderabbitai bot mentioned this pull request Nov 28, 2025

feat: Add distribution graph below 20 distinct count [#DCS-1244] #352

Merged

6 tasks

feat: add build_table_metrics_query method for enhanced column metrics #340

feat: add build_table_metrics_query method for enhanced column metrics #340

Uh oh!

Conversation

AnshumanX304 commented Oct 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Fixes/Implements

Description

Type of change

How Has This Been Tested?

PR Type

Description

Diagram Walkthrough

File Walkthrough

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

qodo-code-review bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AnshumanX304 commented Oct 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 14, 2025 •

edited

Loading

qodo-code-review bot commented Oct 14, 2025 •

edited

Loading

qodo-code-review bot commented Oct 14, 2025 •

edited

Loading