-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
standard bigquery v2, support nested schema in query result #1593
Comments
Thanks for the report! We'll definitely dig into this. In the meantime, you should be able to access the same data returned from httpie with the 3rd param to the query callback: BigQuery().query('...', {...}, function(err, data, apiResponse) {
// apiResponse === http response body
}) |
Another thing confused me is there seems multiple bigquery nodejs binding, above code uses But could you clarify which library is officially supported by Google? or which one is latest? and could you mark others as deprecated?
> const gcloud = require('google-cloud');
> var bigquery = gcloud.bigquery('v2');
undefined
> bigquery.query({ query: 'SELECT STRUCT(1 AS a, 2 AS b, 3 AS c) AS d', useLegacySql: false }).on('data', rows => { console.dir(rows, { depth: null, colors: true }); } );
DestroyableTransform {
_readableState:
ReadableState {
objectMode: true,
highWaterMark: 16,
buffer: [],
...
> { d: { f: [ { v: '1' }, { v: '2' }, { v: '3' } ] } } |
the google/google-api-nodejs-client @google has bigquery support as well, but different APIs, which one should I use? |
https://googlecloudplatform.github.io/google-cloud-node/#/:
So, this is the handmade API.
We don't allow specifying a version to use in this library, we always use the latest. The |
that would be a workaround before this issue is resolved, that means to parse the schema by myself; but I prefer wait this issue to be resolved |
Would you mind getting the JSON output from a complicated query and running it through here: BigQuery().query('...', {}, function(err, data, apiResponse) {
console.log(JSON.stringify(test(apiResponse.schema, apiResponse.rows), null, 2))
});
function test(schema, rows) {
return rows.map(mergeSchema).map(flattenRows);
function mergeSchema(row) {
return row.f.map(function(field, index) {
var schemaField = schema.fields[index];
var value = field.v;
switch (schemaField.type) {
case 'BOOLEAN': {
value = value === 'true';
break;
}
case 'FLOAT': {
if (!is.nil(value)) {
value = parseFloat(value);
}
break;
}
case 'INTEGER': {
if (!is.nil(value)) {
value = parseInt(value, 10);
}
break;
}
case 'RECORD': {
value = Table.mergeSchemaWithRows_(schemaField, [value[0].v]);
break;
}
case 'TIMESTAMP': {
value = new Date(value * 1000);
break;
}
}
var fieldObject = {};
fieldObject[schemaField.name] = value;
return fieldObject;
});
}
function flattenRows(rows) {
return rows.reduce(function(acc, row) {
var key = Object.keys(row)[0];
acc[key] = row[key];
return acc;
}, {});
}
} If that works as expected, I'll send a PR. |
I'm pretty sure it's going to need tweaking, so if you could provide the raw JSON response body from what you think is the most complicated query, I'll be in a better place to test. Thanks! |
I am testing python api as well, that is able to parse some inner level structure, but the value type is still not right: When I put a wrong SQL, it is able to infer the inner struct is $ glcoud-sdk/.../path/to/bq.py --format=prettyjson query --nouse_legacy_sql 'SELECT ARRAY[STRUCT("zx83" AS id, 1 AS a, 2 AS b), ("f8f7", 4, "7a")] d'
Error in query string: Error processing job 'XXXXXXXX': Array elements of types
{STRUCT<STRING, INT64, STRING>, STRUCT<id STRING, a INT64, b INT64>} do not have a common supertype at [1:8] When I put a correct SQL, it gives me this output: bq.py --format=prettyjson query --nouse_legacy_sql 'SELECT ARRAY[STRUCT("zx83" AS id, 1 AS a, 2 AS b), ("f8f7", 4, 7)] d'
[
{
"d": [
{
"a": "1",
"b": "2",
"id": "zx83"
},
{
"a": "4",
"b": "7",
"id": "f8f7"
}
]
}
] the fields a, b is already known as INT, should be a number in JSON and please keep the order, I know this is a bad problem of Python's default JSON parser, I guess it internally called Javascript's
|
Interesting. If something unexpected is happening with that library, I'll have to ask someone where we can report the bug. Off-hand, I'm not sure. As far as ordering of the results, we will keep it in the order the HTTP JSON API returns it to us. The only extra step we do is combining the data and transforming it back to the native type of the result. To make sure my attempted solution above can handle even the most complex queries, would you mind running one that you think might matches, then send over the JSON response body from the API? I appreciate your help debugging so far, sorry for asking for more! |
I've just got some cycles to test on this, are you sure the 3rd argument I'm following examples in this doc: which says const bigquery = BigQuery();
bigquery.query({
query: 'SELECT ARRAY[STRUCT("fa23" AS id, 1 AS a, 2 AS b, 3.4 AS c), ("bd23", 2, 3, 3.14)] AS d',
useLegacySql: false
}, (err, rows, nextQuery, apiResponse) => {
if (err) {
return console.error(err);
}
console.log(arguments.length);
console.dir(rows, { depth: null, colors: true });
console.dir({ err, rows, nextQuery, apiResponse },
{ depth: null, colors: true });
}); but the code get this output: I don't see either nextQuery or apiResponse; does it require any option to be set? 5
[ { d:
[ { v: { f: [ { v: 'fa23' }, { v: '1' }, { v: '2' }, { v: '3.4' } ] } },
{ v: { f: [ { v: 'bd23' }, { v: '2' }, { v: '3' }, { v: '3.14' } ] } } ] } ]
{ err: null,
rows:
[ { d:
[ { v: { f: [ { v: 'fa23' }, { v: '1' }, { v: '2' }, { v: '3.4' } ] } },
{ v: { f: [ { v: 'bd23' }, { v: '2' }, { v: '3' }, { v: '3.14' } ] } } ] } ],
nextQuery: undefined,
apiResponse: undefined } I've tried your way to use query string as 1st, options as 2nd parameter, but it seems ignoring useLegacySql, always reporting invalid SQL (it parsed as legacy SQL) BigQuery().query('SELECT STRUCT(...)', { useLegacySql: false }, function(err, data, apiResponse) {
console.log(JSON.stringify(test(apiResponse.schema, apiResponse.rows), null, 2))
}); does the library support a NODE_DEBUG=... variable to dump api server interaction details? |
Sorry, my earlier example was wrong in regards to Here's another version that will run all of the queries and return the results to your callback once, but also collect all of the apiResponses. Please give this a shot: BigQuery().query('SELECT STRUCT(...)', {
useLegacySql: false,
autoPaginate: false
}, function(err, rows, nextQuery, apiResponse) {
if (err) throw err
console.log(JSON.stringify(test(apiResponse.schema, apiResponse.rows), null, 2))
})
function test(schema, rows) {
return rows.map(mergeSchema).map(flattenRows);
function mergeSchema(row) {
return row.f.map(function(field, index) {
var schemaField = schema.fields[index];
var value = field.v;
switch (schemaField.type) {
case 'BOOLEAN': {
value = value === 'true';
break;
}
case 'FLOAT': {
if (!is.nil(value)) {
value = parseFloat(value);
}
break;
}
case 'INTEGER': {
if (!is.nil(value)) {
value = parseInt(value, 10);
}
break;
}
case 'RECORD': {
value = Table.mergeSchemaWithRows_(schemaField, [value[0].v]);
break;
}
case 'TIMESTAMP': {
value = new Date(value * 1000);
break;
}
}
var fieldObject = {};
fieldObject[schemaField.name] = value;
return fieldObject;
});
}
function flattenRows(rows) {
return rows.reduce(function(acc, row) {
var key = Object.keys(row)[0];
acc[key] = row[key];
return acc;
}, {});
}
} |
this code works for a simple struct, but I believe a recursive call is needed somewhere, for deeply nested structures, like apiResponse:
{ kind: 'bigquery#queryResponse',
schema:
{ fields:
[ { name: 'd',
type: 'RECORD',
mode: 'REPEATED',
fields:
[ { name: 'id', type: 'STRING', mode: 'NULLABLE' },
{ name: 'a', type: 'INTEGER', mode: 'NULLABLE' },
{ name: 'b', type: 'INTEGER', mode: 'NULLABLE' },
{ name: 'c', type: 'FLOAT', mode: 'NULLABLE' },
{ name: 'f',
type: 'RECORD',
mode: 'NULLABLE',
fields: [ { name: 'a', type: 'INTEGER', mode: 'NULLABLE' } ] } ] } ] },
rows:
[ { f:
[ { v:
[ { v:
{ f:
[ { v: 'fa23' },
{ v: '1' },
{ v: '2' },
{ v: '3.4' },
{ v: { f: [ { v: '1' } ] } } ] } },
{ v:
{ f:
[ { v: 'bd23' },
{ v: '2' },
{ v: '3' },
{ v: '3.14' },
{ v: { f: [ { v: '3' } ] } } ] } } ] } ] } ],
totalBytesProcessed: '0',
jobComplete: true,
cacheHit: false } }
[ { d: [ { id: 'fa23', a: 1, b: 2, c: 3.4, f: { f: [ { v: '1' } ] } } ] } ] // this is currently printed, it seems only first element is shown for an array This is from python bq tool: [
{
"d": [
{
"a": "1",
"b": "2",
"c": "3.4",
"f": {
"a": "1"
},
"id": "fa23"
},
{
"a": "2",
"b": "3",
"c": "3.14",
"f": {
"a": "3"
},
"id": "bd23"
}
]
}
] Expected is: [
{
"d": [
{
"id": "fa23",
"a": 1,
"b": 2,
"c": 3.4,
"f": {
"a": 1
}
},
{
"id": "bd23",
"a": 2,
"b": 3,
"c": 3.14,
"f": {
"a": 3
},
}
]
}
] |
Thanks for the snippet, I'll get to work on a recursive solution. |
the query is |
Okay, please try: console.log(JSON.stringify(test(apiResponse.schema, apiResponse.rows), null, 2))
function test(schema, rows) {
return (Array.isArray(rows) ? rows : [rows]).map(mergeSchema).map(flattenRows);
function mergeSchema(row) {
return row.f.map(function(field, index) {
var schemaField = schema.fields[index];
var value = field.v;
switch (schemaField.type) {
case 'BOOLEAN': {
value = value === 'true';
break;
}
case 'FLOAT': {
value = parseFloat(value);
break;
}
case 'INTEGER': {
value = parseInt(value, 10);
break;
}
case 'RECORD': {
if (schemaField.mode === 'REPEATED') {
value = value.map(function(val) {
return test(schemaField, val.v).pop();
});
} else {
value = test(schemaField, value).pop();
}
break;
}
case 'TIMESTAMP': {
value = new Date(value * 1000);
break;
}
}
var fieldObject = {};
fieldObject[schemaField.name] = value;
return fieldObject;
});
}
function flattenRows(rows) {
return rows.reduce(function(acc, row) {
var key = Object.keys(row)[0];
acc[key] = row[key];
return acc;
}, {});
}
} Feel free to run other responses against it as well. |
I'm re-using this code on a table.getRows; if a table is defined with nested fields, this function returns all objects with nested found 2 problems:
function test(schema, rows) {
return (Array.isArray(rows) ? rows : [rows]).map(mergeSchema).map(flattenRows);
function mergeSchema(row) {
return row.f.map(function(field, index) {
var schemaField = schema.fields[index];
var value = field.v;
var fieldObject = {};
// skip any schema decoding if it's null
if (value === null) {
fieldObject[schemaField.name] = null;
return fieldObject;
}
switch (schemaField.type) {
case 'BOOLEAN': {
value = value === 'true';
break;
}
case 'FLOAT': {
value = parseFloat(value);
break;
}
case 'INTEGER': {
value = parseInt(value, 10);
break;
}
case 'RECORD': {
if (schemaField.mode === 'REPEATED') {
value = value.map(function(val) {
return test(schemaField, val.v).pop();
});
} else {
value = test(schemaField, value).pop();
}
break;
}
case 'TIMESTAMP': {
value = new Date(value * 1000);
break;
}
}
fieldObject[schemaField.name] = value;
return fieldObject;
});
}
function flattenRows(rows) {
return rows.reduce(function(acc, row) {
var key = Object.keys(row)[0];
acc[key] = row[key];
return acc;
}, {});
}
} |
Thanks! I'll put that into a PR and ping you for another look when it's ready. |
@c0b - please take a look at #1648.
We actually do that automatically within table.getRows(function(err, rows, apiResponse) {
console.log(JSON.stringify(test(table.metadata.schema, apiResponse.rows), null, 2))
}); |
Please support nested schema field names in complex queries like
SELECT ARRAY[STRUCT(1 AS a, 2 AS b, 3 AS c)] AS d
This is right, the field name
cnt
is correctly usedd
is in use, not others a, b, c,f
andv
, and all their values type, the 1,2,3 here should be integersI have even more complex queries than this, all are like this only the outer one level schema field names / value types are got populated
A raw query over HTTP, the http tool is from httpie, just an advanced version of curl; after I get access token with same httpie tool from token allocation, I can do queries like this on command line, here the server response included correct schema with nested fields, so technically it should be possible to populate all inner level field names and value types correct
Environment details
I have search other issues here #1564 #1135 about nested schema, but for different purpose, I believe this ticket is different.
The text was updated successfully, but these errors were encountered: