Skip to content

pandas.io.gbq verify_schema seems to be too strict. #11359

Closed
@FlxVctr

Description

This line seems for me to be too strict for repeated insertion, because apparently GBQ is not consistent in the order of fields in the schema (or my application screws the order of fields up, anyway, I would say the verification is too strict).

So for example:

dict1 = {'fields': [{'name': 'coordinates_0', 'type': 'FLOAT'}, {'name': 'created_at', 'type': 'STRING'}]}
dict2 = {'fields': [{'name': 'created_at', 'type': 'STRING'}, {'name': 'coordinates_0', 'type': 'FLOAT'}]}
dict1 == dict2  # gives False

would make verification fail, though insertion would work, as the insert as JSON makes the order of fields irrelevant.

Solved that for myself for the moment with:

    def verify_schema(self, dataset_id, table_id, schema):
        from apiclient.errors import HttpError

        try:
            bq_schema = (self.service.tables().get(
                projectId=self.project_id,
                datasetId=dataset_id,
                tableId=table_id
                ).execute()['schema'])
            return set(
                       [json.dumps(x) for x in bq_schema['fields']]  # dump necessary to make dicts hashable
                      ) == set(
                               [json.dumps(x) for x in schema['fields']]
            )  # this still fails if key order is different. But GBQ seems to keep key order.

        except HttpError as ex:
            self.process_http_error(ex)

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandas

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions