Issue #243 - Add CustomSql analyzer by wwitkowski · Pull Request #244 · awslabs/python-deequ

wwitkowski · 2025-05-30T11:03:58Z

Issue #, if available:
#243

Description of changes:
Add a Python wrapper for the existing CustomSql analyzer from the Scala implementation

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions

Generated by AI (model: us.anthropic.claude-opus-4-6-v1, prompt: db2249a9) — may not be fully accurate. Reply if this doesn't help.

github-actions · 2026-05-18T17:30:45Z

        self.assertEqual(self.CountDistinct("b"), [Row(value=1.0)])

+    def test_CustomSql(self):
+        self.df.createOrReplaceTempView("input_table")


EDGE_CASE: The test test_CustomSql relies on self.df.createOrReplaceTempView("input_table") being called within the test method, but the helper method CustomSql (line 115) calls self.AnalysisRunner.onData(self.df) which registers the DataFrame for analysis but does NOT register the temp view. If test_CustomSql runs before any other test that might register the view, it works. However, test_fail_CustomSql and test_fail_CustomSql_incorrect_query do NOT call createOrReplaceTempView, so they depend on the temp view persisting from test_CustomSql. If test execution order changes (e.g., running test_fail_CustomSql_incorrect_query in isolation), the temp view won't exist and the test will fail for the wrong reason (missing table, not incorrect query).

Line 309: self.df.createOrReplaceTempView("input_table") is only in test_CustomSql. Lines 320-321 (test_fail_CustomSql) and 324-325 (test_fail_CustomSql_incorrect_query) use "input_table" without registering the view. The setUpClass method (lines 39-47 in full source) does not register a temp view.

github-actions · 2026-05-18T17:30:45Z

+    @pytest.mark.xfail(reason="@unittest.expectedFailure")
+    def test_fail_CustomSql(self):
+        self.assertEqual(self.CustomSql("SELECT SUM(b) FROM input_table"), [Row(value=1.0)])
+


MISSING_TEST: The test test_fail_CustomSql_incorrect_query uses SELECT SUM(b) without a FROM clause. This is expected to fail, but the xfail marker means the test passes if an exception is raised. However, there's no test that verifies the correct failure mode (e.g., that a specific exception type is raised or that the metric result is NaN/empty). A proper negative test should use pytest.raises or check the specific error behavior rather than relying on any exception being acceptable.

Line 324-325: @pytest.mark.xfail(reason="@unittest.expectedFailure") followed by self.CustomSql("SELECT SUM(b)") - this will pass as long as ANY exception is raised, not specifically validating the expected behavior for an invalid SQL query.

github-actions · 2026-05-18T17:30:45Z

+        :return self
+        """
+        return self._deequAnalyzers.CustomSql(self.expression, self.disambiguator)
+


EDGE_CASE: The CustomSql analyzer passes self.expression and self.disambiguator directly to the Scala CustomSql constructor. However, looking at the Deequ Scala source for CustomSql, the constructor signature is CustomSql(expression: String, disambiguator: Option[String]) in some versions. If the Scala API expects an Option[String] for disambiguator, passing a plain Python string would cause a Py4J type mismatch error at runtime.

Line 385: return self._deequAnalyzers.CustomSql(self.expression, self.disambiguator) passes self.disambiguator as a raw string. Other analyzers in this file wrap optional parameters with self._jvm.scala.Option.apply(...) (e.g., line 181 for ApproxCountDistinct: self._jvm.scala.Option.apply(self.where)). If the Scala CustomSql expects Option[String], this will fail at runtime.

wwitkowski added 4 commits May 19, 2025 17:22

add customsql analyzer

0d89182

add unit tests for disambiguator and incorrect query

cd740d0

update incorrect query unit test

bca4cf1

update docs

636ff93

wwitkowski marked this pull request as ready for review May 30, 2025 11:04

wwitkowski changed the title ~~Add CustomSql analyzer~~ Issue #243 - Add CustomSql analyzer May 30, 2025

github-actions Bot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #243 - Add CustomSql analyzer#244

Issue #243 - Add CustomSql analyzer#244
wwitkowski wants to merge 4 commits into
awslabs:masterfrom
wwitkowski:feature/add-customsql-analyzer

wwitkowski commented May 30, 2025

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 18, 2026

Uh oh!

github-actions Bot May 18, 2026

Uh oh!

github-actions Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wwitkowski commented May 30, 2025

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant