Skip to content

bigquery: please support non-legacy-sql nested data type, and preserve order in json #2342

@c0b

Description

@c0b

report in googleapis/google-cloud-node#1593 (comment) from a nodejs app, but since my app can also be written in Python I have a test of Python BigQuery API as well, it seems have a same problem:

$ bq.py --format=prettyjson query --nouse_legacy_sql 'SELECT ARRAY[STRUCT("zx83" AS id, 1 AS a, 2 AS b), ("f8f7", 4, 7)] d' |& cat -An
...
     2  [$
     3    {$
     4      "d": [$
     5        {$
     6          "a": "1", $
     7          "b": "2", $
     8          "id": "zx83"$
     9        }, $
    10        {$
    11          "a": "4", $
    12          "b": "7", $
    13          "id": "f8f7"$
    14        }$
    15      ]$
    16    }$
    17  ]$

the problems:

  1. nested value types are not interpreted, the value types of a and b can be inferred as Integers (from bigquery api response, there are schema including nested field's data type), should be converted to python integers; (the way to fix will need somewhat recursively interpret data types)
  2. keys order are not preserved, someone here might argue following reasons but there are ways workaround:
    1. JSON object's key order doesn't matter to machine; but when we print a prettyjson format, it's for human, and keys order does matter
    2. python's default json.loads convert a JSON object to python dict which uses key's internal hash code somewhat unpredictable order, way to fix is to load with OrderedDict, and drop sort_keys parameter when calling json.dumps
  3. trailing space in object key value pair lines, this is trivial and some Python JSON API's own problem, there are many careless python doing this including this project; please read below expected output, can be fixed by json.dumps(..., separators=(',', ': '))

many tools written in Python suffered the same problem, like what I commented in the popular httpie tool: httpie/cli#427 (comment)

HOW to Fix 2 & 3

In [1]: import json

In [2]: from collections import OrderedDict

In [3]: data = json.loads(
             '{"d": [ { "id": "zx83", "a": 1, "b": 2}, { "id": "f8f7", "a": 4, "b": 7 }  ]}',
               object_pairs_hook=OrderedDict)

In [4]: data
Out[4]: 
OrderedDict([(u'd',
              [OrderedDict([(u'id', u'zx83'), (u'a', 1), (u'b', 2)]),
               OrderedDict([(u'id', u'f8f7'), (u'a', 4), (u'b', 7)])])])

In [5]: json.dumps(data, indent=2, separators=(',', ': '))
Out[5]: '{\n  "d": [\n    {\n      "id": "zx83",\n      "a": 1,\n      "b": 2\n    },\n    {\n      "id": "f8f7",\n      "a": 4,\n      "b": 7\n    }\n  ]\n}'

In [6]: print json.dumps(data, indent=2, separators=(',', ': '))
{
  "d": [
    {
      "id": "zx83",
      "a": 1,
      "b": 2
    },
    {
      "id": "f8f7",
      "a": 4,
      "b": 7
    }
  ]
}

Expected output:

     2  [$
     3    {$
     4      "d": [$
     5        {$
     6          "id": "zx83",$
     7          "a": 1,$
     8          "b": 2$
     9        },$
    10        {$
    11          "id": "f8f7",$
    12          "a": 4,$
    13          "b": 7$
    14        }$
    15      ]$
    16    }$
    17  ]$

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the BigQuery API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions