pandas.io.gbq verify_schema seems to be too strict. #11359
Closed
Description
This line seems for me to be too strict for repeated insertion, because apparently GBQ is not consistent in the order of fields in the schema (or my application screws the order of fields up, anyway, I would say the verification is too strict).
So for example:
dict1 = {'fields': [{'name': 'coordinates_0', 'type': 'FLOAT'}, {'name': 'created_at', 'type': 'STRING'}]}
dict2 = {'fields': [{'name': 'created_at', 'type': 'STRING'}, {'name': 'coordinates_0', 'type': 'FLOAT'}]}
dict1 == dict2 # gives False
would make verification fail, though insertion would work, as the insert as JSON makes the order of fields irrelevant.
Solved that for myself for the moment with:
def verify_schema(self, dataset_id, table_id, schema):
from apiclient.errors import HttpError
try:
bq_schema = (self.service.tables().get(
projectId=self.project_id,
datasetId=dataset_id,
tableId=table_id
).execute()['schema'])
return set(
[json.dumps(x) for x in bq_schema['fields']] # dump necessary to make dicts hashable
) == set(
[json.dumps(x) for x in schema['fields']]
) # this still fails if key order is different. But GBQ seems to keep key order.
except HttpError as ex:
self.process_http_error(ex)