SQL execution Exception executing validations, using SQL and Pandas data sources: "Duplicated field name in view schema: column_values.nonnull.unexpected_count"

**Bug description**
SQL execution Exception executing validations, using SQL and Pandas data sources.

I created a File Data Context, connected to SQL data, filesystem data and Pandas Dataframes. I defined my expectation suite, created a validation definition and run it. While executing

`validation_results = validation_definition.run()` with SQL and Pandas datasource, I'm facing with this error:

```
Calculating Metrics:  86%|████████▌ | 24/28 [00:01<00:00, 20.39it/s]An SQL execution Exception occurred.  OperationalError: "(psycopg2.OperationalError) **Duplicated field name in view schema: column_values.nonnull.unexpected_count**
DETAIL:  java.sql.SQLException: Duplicated field name in view schema: column_values.nonnull.unexpected_count

[SQL: SELECT sum(CASE WHEN (field1 IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS "column_values.nonnull.unexpected_count", sum(CASE WHEN (field2 IS NULL) THEN %(param_3)s ELSE %(param_4)s END) AS "column_values.nonnull.unexpected_count", sum(CASE WHEN (field3 IS NULL) THEN %(param_5)s ELSE %(param_6)s END) AS "column_values.nonnull.unexpected_count", sum(CASE WHEN (field4 IS NULL) THEN %(param_7)s ELSE %(param_8)s END) AS "column_values.nonnull.unexpected_count" 
FROM (SELECT * 
FROM (SELECT * from my_table_data) AS anon_1 
WHERE true) AS anon_1]
[parameters: {'param_1': 1, 'param_2': 0, 'param_3': 1, 'param_4': 0, 'param_5': 1, 'param_6': 0, 'param_7': 1, 'param_8': 0}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)".  Traceback: "Traceback (most recent call last):
  File "d:\Veronica\Anaconda\envs\gxenv\lib\site-packages\sqlalchemy\engine\base.py", line 1900, in _execute_context
    self.dialect.do_execute(
  File "d:\Veronica\Anaconda\envs\gxenv\lib\site-packages\sqlalchemy\engine\default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: Duplicated field name in view schema: column_values.nonnull.unexpected_count
DETAIL:  java.sql.SQLException: Duplicated field name in view schema: column_values.nonnull.unexpected_count
```

The dataframe hasn't duplicated field names.


**To Reproduce**
**_great_expectations.yml config:_**

`config_version: 4.0`
`config_variables_file_path: uncommitted/config_variables.yml`

`plugins_directory: plugins/`

`stores:`

```
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/
```

```
  validation_results_store:
    class_name: ValidationResultsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/
```

```
  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/
```

```
  validation_definition_store:
    class_name: ValidationDefinitionStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: validation_definitions/
```

```
expectations_store_name: expectations_store
validation_results_store_name: validation_results_store
checkpoint_store_name: checkpoint_store
```

`data_docs_sites:`

 ```
 local_site:
    class_name: SiteBuilder
    show_how_to_buttons: true
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
fluent_datasources:
  my_ds:
    type: sqlite
    id: 0717e05c-7846-40ca-8b77-834fa9f74d0b
    assets:
      my_data_asset:
        type: query
        id: d891b4e5-104c-481d-836c-aa8e4395691c
        batch_metadata: {}
        batch_definitions:
          FULL_TABLE:
            id: 4814ef8f-38d8-4f04-ad89-d35f8f2a0fc1
            partitioner:
        query: SELECT * from mytable
    connection_string: ${myconnectionstring}
  employee_department:
    type: pandas_filesystem
    id: fb484e5a-d57f-444b-897f-2af191d8fe16
    assets:
      employee_department:
        type: csv
        id: 5ce9d256-4b94-4e02-b2b9-135765006abb
    base_directory: D:\Veronica\PyCharmProjects\gx_test_135
  dq:
    type: sqlite
    id: acc48a7b-f5af-425c-9dfc-6d7685283c80
    assets:
      dq_asset:
        type: query
        id: 52c24fe5-1c44-4e60-a4f0-b356b7bb9c53
        batch_metadata: {}
        batch_definitions:
          FULL_TABLE:
            id: 51874da9-477a-4f79-92af-e8e472bd4c6a
            partitioner:
        query: SELECT * from asset_name = myasset
    connection_string: sqlite:///D:\Veronica\PyCharmProjects\ds-gxupgrade\dq.db
  dq_employee:
    type: pandas_filesystem
    id: ab12ba8f-61ea-45ef-a4b3-28ed985629c4
    assets:
      employee_department:
        type: csv
        id: 29da965b-7516-4f82-8ffe-ad4b2685cc12
        batch_definitions:
          employee_department:
            id: 54381107-2029-4eca-801c-dbd09fe73353
            partitioner:
              regex: employee_department.csv$
              param_names: []
    base_directory: D:\Veronica\PyCharmProjects\ds-gxupgrade
 my_data_pandas:
    type: pandas
    id: 2d65fc95-0040-4a5b-bfb9-433288e9918a
    assets:
      pandas_asset:
        type: dataframe
        id: 02b58697-1ca2-4b46-ad6b-b80bf6209ad2
        batch_metadata: {}
        batch_definitions:
          cr_pandas_batch:
            id: 9cd952be-a29d-4a90-a230-5ba015974584
            partitioner:
analytics_enabled: false
data_context_id: e10c5839-a0de-4edc-a051-0c43c2a83f71
```


**My code:**

```
from sqlalchemy import *
import great_expectations as gx
import pandas as pd

context = gx.get_context()

engine = create_engine(context.config_variables['my_connection_string'])


dataset = pd.read_sql_query("select * from my_table_data", engine)
df = pd.DataFrame(dataset)
```

"""Retrieve the dataframe Batch Definition"""
```
data_source_name = "my_data_pandas"
data_asset_name = "pandas_asset"
batch_definition_name = "cr_pandas_batch"
batch_definition = (
    context.data_sources.get(data_source_name)
    .get_asset(data_asset_name)
    .get_batch_definition(batch_definition_name)
)

batch_parameters = {"dataframe": df}


definition_name = "my_data_vd"
validation_definition = context.validation_definitions.get(definition_name)

```


"""Get the dataframe as a Batch"""
```
batch = batch_definition.get_batch(batch_parameters=batch_parameters)

result = validation_definition.run()
```


**Expected behavior**
My usecase requires the complete validation **with Pandas data source**. 

**Environment:**
 - Operating System: tested in Windows and Linux
 - Great Expectations Version: 1.3.5
 - Data Source: Pandas, SQL, filesystem
 - Cloud environment: none

**Additional context**
It works if the expectation suite contains only one expectation (with the three different data sources); it works with multiple expectations inside the expectation suite only with filesystem data source. My usecase requires the complete validation of the dataset **with Pandas data source**, the other datasources have been added only for test. I checked the requirements.txt in this repository and my environment is alligned with this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL execution Exception executing validations, using SQL and Pandas data sources: "Duplicated field name in view schema: column_values.nonnull.unexpected_count" #10926

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development