Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql lab + shillelagh does not return column values ​​greater than 64. #30245

Open
2 of 3 tasks
mibarannikov opened this issue Sep 11, 2024 · 3 comments
Open
2 of 3 tasks
Assignees
Labels
sqllab Namespace | Anything related to the SQL Lab

Comments

@mibarannikov
Copy link

mibarannikov commented Sep 11, 2024

Bug description

All values ​​are null in json for fields greater than 64. I updated superset 3.1 -> 4.0.2. I updated shillagh to 1.2.27 (apsw 3.42.0.1). I created another json array source. But nothing changed. All json fields greater than 64 are null.

Generation was performed by the activity process.
var test = [];
for (var l = 0; l < 2; l++) {
var o = {};

for (var i = 0; i < 100; i++) {
    o["col_" + i] = i + (100 * l);
}
test.push(o);

}
execution.setVariable("test", JSON.stringify(test));

How to reproduce the bug

362841266-a732a7dc-4ca4-4803-a227-f42225142ad0 (1)

Screenshots/recordings

362841266-a732a7dc-4ca4-4803-a227-f42225142ad0
362574713-ba6db765-efea-45c0-a0aa-da2f27b391a6

Superset version

4.0.2

Python version

3.9

Node version

16

Browser

Chrome

Additional context

logs from superset
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,485:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 0, 'col_1': 1, 'col_2': 2, 'col_3': 3, 'col_4': 4, 'col_5': 5, 'col_6': 6, 'col_7': 7, 'col_8': 8, 'col_9': 9, 'col_10': 10, 'col_11': 11, 'col_12': 12, 'col_13': 13, 'col_14': 14, 'col_15': 15, 'col_16': 16, 'col_17': 17, 'col_18': 18, 'col_19': 19, 'col_20': 20, 'col_21': 21, 'col_22': 22, 'col_23': 23, 'col_24': 24, 'col_25': 25, 'col_26': 26, 'col_27': 27, 'col_28': 28, 'col_29': 29, 'col_30': 30, 'col_31': 31, 'col_32': 32, 'col_33': 33, 'col_34': 34, 'col_35': 35, 'col_36': 36, 'col_37': 37, 'col_38': 38, 'col_39': 39, 'col_40': 40, 'col_41': 41, 'col_42': 42, 'col_43': 43, 'col_44': 44, 'col_45': 45, 'col_46': 46, 'col_47': 47, 'col_48': 48, 'col_49': 49, 'col_50': 50, 'col_51': 51, 'col_52': 52, 'col_53': 53, 'col_54': 54, 'col_55': 55, 'col_56': 56, 'col_57': 57, 'col_58': 58, 'col_59': 59, 'col_60': 60, 'col_61': 61, 'col_62': 62, 'col_63': 63, 'col_64': 64, 'col_65': 65, 'col_66': 66, 'col_67': 67, 'col_68': 68, 'col_69': 69, 'col_70': 70, 'col_71': 71, 'col_72': 72, 'col_73': 73, 'col_74': 74, 'col_75': 75, 'col_76': 76, 'col_77': 77, 'col_78': 78, 'col_79': 79, 'col_80': 80, 'col_81': 81, 'col_82': 82, 'col_83': 83, 'col_84': 84, 'col_85': 85, 'col_86': 86, 'col_87': 87, 'col_88': 88, 'col_89': 89, 'col_90': 90, 'col_91': 91, 'col_92': 92, 'col_93': 93, 'col_94': 94, 'col_95': 95, 'col_96': 96, 'col_97': 97, 'col_98': 98, 'col_99': 99, 'rowid': 0}
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,485:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 100, 'col_1': 101, 'col_2': 102, 'col_3': 103, 'col_4': 104, 'col_5': 105, 'col_6': 106, 'col_7': 107, 'col_8': 108, 'col_9': 109, 'col_10': 110, 'col_11': 111, 'col_12': 112, 'col_13': 113, 'col_14': 114, 'col_15': 115, 'col_16': 116, 'col_17': 117, 'col_18': 118, 'col_19': 119, 'col_20': 120, 'col_21': 121, 'col_22': 122, 'col_23': 123, 'col_24': 124, 'col_25': 125, 'col_26': 126, 'col_27': 127, 'col_28': 128, 'col_29': 129, 'col_30': 130, 'col_31': 131, 'col_32': 132, 'col_33': 133, 'col_34': 134, 'col_35': 135, 'col_36': 136, 'col_37': 137, 'col_38': 138, 'col_39': 139, 'col_40': 140, 'col_41': 141, 'col_42': 142, 'col_43': 143, 'col_44': 144, 'col_45': 145, 'col_46': 146, 'col_47': 147, 'col_48': 148, 'col_49': 149, 'col_50': 150, 'col_51': 151, 'col_52': 152, 'col_53': 153, 'col_54': 154, 'col_55': 155, 'col_56': 156, 'col_57': 157, 'col_58': 158, 'col_59': 159, 'col_60': 160, 'col_61': 161, 'col_62': 162, 'col_63': 163, 'col_64': 164, 'col_65': 165, 'col_66': 166, 'col_67': 167
, 'col_68': 168, 'col_69': 169, 'col_70': 170, 'col_71': 171, 'col_72': 172, 'col_73': 173, 'col_74': 174, 'col_75': 175, 'col_76': 176, 'col_77': 177, 'col_78': 178, 'col_79': 179, 'col_80': 180, 'col_81': 181, 'col_82': 182, 'col_83': 183, 'col_84': 184, 'col_85': 185, 'col_86': 186, 'col_87': 187, 'col_88': 188, 'col_89': 189, 'col_90': 190, 'col_91': 191, 'col_92': 192, 'col_93': 193, 'col_94': 194, 'col_95': 195, 'col_96': 196, 'col_97': 197, 'col_98': 198, 'col_99': 199, 'rowid': 1}
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:requests_cache.policy.actions:Cache directives from request headers: CacheDirectives()
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:requests_cache.policy.actions:Pre-read cache checks: Passed
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:requests_cache.policy.actions:Post-read cache actions: CacheActions(expire_after=180.0)
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,490:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 0, 'col_1': 1, 'col_2': 2, 'col_3': 3, 'col_4': 4, 'col_5': 5, 'col_6': 6, 'col_7': 7, 'col_8': 8, 'col_9': 9, 'col_10': 10, 'col_11': 11, 'col_12': 12, 'col_13': 13, 'col_14': 14, 'col_15': 15, 'col_16': 16, 'col_17': 17, 'col_18': 18, 'col_19': 19, 'col_20': 20, 'col_21': 21, 'col_22': 22, 'col_23': 23, 'col_24': 24, 'col_25': 25, 'col_26': 26, 'col_27': 27, 'col_28': 28, 'col_29': 29, 'col_30': 30, 'col_31': 31, 'col_32': 32, 'col_33': 33, 'col_34': 34, 'col_35': 35, 'col_36': 36, 'col_37': 37, 'col_38': 38, 'col_39': 39, 'col_40': 40, 'col_41': 41, 'col_42': 42, 'col_43': 43, 'col_44': 44, 'col_45': 45, 'col_46': 46, 'col_47': 47, 'col_48': 48, 'col_49': 49, 'col_50': 50, 'col_51': 51, 'col_52': 52, 'col_53': 53, 'col_54': 54, 'col_55': 55, 'col_56': 56, 'col_57': 57, 'col_58': 58, 'col_59': 59, 'col_60': 60, 'col_61': 61, 'col_62': 62, 'col_63': 63, 'rowid': 0}
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,491:DEBUG:root:Query 260: Handling cursor
2024-08-29T20:51:33+03:00 2024-08-29 17:51:33,495:DEBUG:shillelagh.adapters.api.generic_json:{'col_0': 100, 'col_1': 101, 'col_2': 102, 'col_3': 103, 'col_4': 104, 'col_5': 105, 'col_6': 106, 'col_7': 107, 'col_8': 108, 'col_9': 109, 'col_10': 110, 'col_11': 111, 'col_12': 112, 'col_13': 113, 'col_14': 114, 'col_15': 115, 'col_16': 116, 'col_17': 117, 'col_18': 118, 'col_19': 119, 'col_20': 120, 'col_21': 121, 'col_22': 122, 'col_23': 123, 'col_24': 124, 'col_25': 125, 'col_26': 126, 'col_27': 127, 'col_28': 128, 'col_29': 129, 'col_30': 130, 'col_31': 131, 'col_32': 132, 'col_33': 133, 'col_34': 134, 'col_35': 135, 'col_36': 136, 'col_37': 137, 'col_38': 138, 'col_39': 139, 'col_40': 140, 'col_41': 141, 'col_42': 142, 'col_43': 143, 'col_44': 144, 'col_45': 145, 'col_46': 146, 'col_47': 147, 'col_48': 148, 'col_49': 149, 'col_50': 150, 'col_51': 151, 'col_52': 152, 'col_53': 153, 'col_54': 154, 'col_55': 155, 'col_56': 156, 'col_57': 157, 'col_58': 158, 'col_59': 159, 'col_60': 160, 'col_61': 161, 'col_62': 162, 'col_63': 163, 'rowid': 1}
2024-08-29T20:51:33+03:00 Query 260: Storing results in results backend, key: 8c5eba8a-427b-4cf3-b9a4-b735de18f0aa

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@mibarannikov mibarannikov changed the title Sqlab + shillelagh does not return column values ​​greater than 64. Sql lab + shillelagh does not return column values ​​greater than 64. Sep 11, 2024
Copy link

dosubot bot commented Sep 11, 2024

@mibarannikov Hello! I'm here to help you with any bugs, questions, or contributions.

The issue you're encountering with the Shillelagh adapter not returning column values greater than 64 in JSON data sources might be related to specific configurations or limitations in the adapter's handling of JSON data.

Potential Causes and Solutions:

  1. Column Limit Configuration:
    The Shillelagh adapter has a limit configuration that might be affecting the number of columns processed. The get_data method in the SupersetShillelaghAdapter class includes a limit configuration that fetches the app_limit from the application configuration SUPERSET_META_DB_LIMIT:

    def get_data(
        self,
        bounds: dict[str, Filter],
        order: list[tuple[str, RequestedOrder]],
        limit: int | None = None,
        offset: int | None = None,
        **kwargs: Any,
    ) -> Iterator[Row]:
        app_limit: int | None = current_app.config["SUPERSET_META_DB_LIMIT"]
        if limit is None:
            limit = app_limit
        elif app_limit is not None:
            limit = min(limit, app_limit)
    
        query = self._build_sql(bounds, order, limit, offset)
    
        with self.engine_context() as engine:
            connection = engine.connect()
            rows = connection.execute(query)
            for i, row in enumerate(rows):
                data = dict(zip(self.columns, row))
                data["rowid"] = data[self._rowid] if self._rowid else i
                yield data

    Ensure that the SUPERSET_META_DB_LIMIT is set to a value that accommodates the number of columns you need.

  2. Column Inspection:
    The _set_columns method in the SupersetShillelaghAdapter class inspects the table to get its columns during initialization. This method might be missing columns if there is a limit or issue during the inspection process:

    def _set_columns(self) -> None:
        from superset.models.core import Database
    
        database = (
            db.session.query(Database).filter_by(database_name=self.database).first()
        )
        if database is None:
            raise ProgrammingError(f"Database not found: {self.database}")
        self._allow_dml = database.allow_dml
    
        table = sql_parse.Table(self.table, self.schema, self.catalog)
        security_manager.raise_for_access(database=database, table=table)
    
        self.engine_context = partial(
            database.get_sqla_engine,
            catalog=self.catalog,
            schema=self.schema,
        )
    
        metadata = MetaData()
        with self.engine_context() as engine:
            try:
                self._table = Table(
                    self.table,
                    metadata,
                    schema=self.schema,
                    autoload=True,
                    autoload_with=engine,
                )
            except NoSuchTableError as ex:
                raise ProgrammingError(f"Table does not exist: {self.table}") from ex
    
        primary_keys = [
            column for column in list(self._table.primary_key) if column.primary_key
        ]
        if len(primary_keys) == 1 and primary_keys[0].type.python_type == int:
            self._rowid = primary_keys[0].name
    
        self.columns = {
            column.name: self.get_field(column.type.python_type)
            for column in self._table.c
        }
  3. Logging and Debugging:
    Enable detailed logging to identify any issues during the data retrieval process. Adjust the logging level to DEBUG in the superset/config.py file:

    LOG_LEVEL = logging.DEBUG
    TIME_ROTATE_LOG_LEVEL = logging.DEBUG
  4. Check for Known Issues:
    There might be known issues or bug reports related to handling large JSON objects or arrays in the Shillelagh adapter. Ensure you have the latest version of the adapter and check the repository for any related issues.

By addressing these potential causes, you should be able to resolve the issue with column values greater than 64 being null in JSON data sources. If the problem persists, further investigation into the specific configurations and limitations of the Shillelagh adapter may be necessary.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the sqllab Namespace | Anything related to the SQL Lab label Sep 11, 2024
@rusackas
Copy link
Member

@betodealmeida is definitely the one to talk to here!

@mibarannikov
Copy link
Author

Thanks, updating the library apsw helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

No branches or pull requests

3 participants