This repository was archived by the owner on Nov 11, 2022. It is now read-only.
This repository was archived by the owner on Nov 11, 2022. It is now read-only.
DirectPipelineRunner doesn't support StandardSql with BigQueryIO.READ #539
Closed
Description
We're writing all our BigQuery queries using the StandardSql option in the web console, but when I tried to execute a query from Dataflow running locally with this syntax enabled with usingStandardSql()
I ran into this error:
SEVERE: Error when trying to dry run query SELECT * from `bigquery-public-data.samples.shakespeare` LIMIT 100.
com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"location" : "`bigquery-public-data:samples.shakespeare`",
"locationType" : "other",
"message" : "Invalid table name: `bigquery-public-data:samples.shakespeare`",
"reason" : "invalid"
} ],
"message" : "Invalid table name: `bigquery-public-data:samples.shakespeare`"
}
This is an example that triggers the error, and if I switch to DataflowPipelineRunner
it works like a charm.
public class StandardSql {
private static String gsLocation = "gs://hallois/";
private static String project = "uc-prox-development";
public static void main(String[] args) {
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
// options.setRunner(DataflowPipelineRunner.class);
options.setRunner(DirectPipelineRunner.class);
options.setProject(project);
options.setTempLocation(gsLocation + "jars");
Pipeline p = Pipeline.create(options);
p.apply(BigQueryIO.Read
.fromQuery("SELECT * from `bigquery-public-data.samples.shakespeare` LIMIT 100")
.usingStandardSql())
.apply(MapElements.via((TableRow tr) -> (String) tr.get("word")).withOutputType(new TypeDescriptor<String>() {}))
.apply(TextIO.Write.to(gsLocation + "shakespeare"));
p.run();
}
}
Metadata
Metadata
Assignees
Labels
No labels