The BigQuery emulator provides a way to launch a BigQuery server on your local machine for testing and development.
The Recidiviz fork has many features, performance improvements, and bugfixes that are missing from the upstream repository.
- If you use the Go BigQuery client, you can launch the emulator within the testing process by httptest .
- BigQuery emulator can be launched as a standalone process. So, you can use the BigQuery emulator from programs written in non-Go languages or the bq command, by specifying the address of the launched BigQuery emulator.
- BigQuery emulator utilizes SQLite for storage. You can select either memory or file as the data storage destination at startup, and if you set it to file, data can be persisted.
- You can load seed data from a JSON or YAML file on startup
We've implemented all the BigQuery APIs except the API to manipulate IAM resources. It is possible that some options are not supported, in which case please report them in an Issue.
BigQuery emulator supports loading data from Google Cloud Storage and exporting table data. Currently, only CSV and JSON data types can be used for export.
If you use Google Cloud Storage emulator, please set STORAGE_EMULATOR_HOST environment variable.
Supports gRPC-based read/write using BigQuery Storage API.
Supports both Apache Avro and Arrow formats.
BigQuery emulator supports many of the specifications present in Google Standard SQL. For example, it has the following features.
- 200+ standard functions
- Wildcard table
- Templated Argument Function
- JavaScript UDF
If you want to know which specific features are supported, please see here
If this project is of useful to you or your team, consider sponsoring the original creator @goccy
If Go is installed, you can install the latest version with the following command
$ go install github.com/Recidiviz/bigquery-emulator/cmd/bigquery-emulator@latestThe BigQuery emulator depends on go-zetasql.
This library takes a very long time to install because it automatically builds the ZetaSQL library during install.
It may look like it hangs because it does not log anything during the build process, but if the clang process is running in the background, it is working fine, so just wait it out.
Also, for this reason, the following environment variables must be enabled for installation.
CGO_ENABLED=1
CXX=clang++You can also download the docker image with the following command
$ docker pull ghcr.io/Recidiviz/bigquery-emulator:latestYou can also download the darwin(amd64) and linux(amd64) binaries directly from releases
If you can install the bigquery-emulator CLI, you can start the server using the following options.
$ ./bigquery-emulator -h
Usage:
bigquery-emulator [OPTIONS]
Application Options:
--project= specify the project name
--dataset= specify the dataset name
--port= specify the http port number. this port used by bigquery api (default: 9050)
--grpc-port= specify the grpc port number. this port used by bigquery storage api (default: 9060)
--log-level= specify the log level (debug/info/warn/error) (default: error)
--log-format= specify the log format (console/json) (default: console)
--database= specify the database file if required. if not specified, it will be on memory
--data-from-yaml= specify the path to the YAML file that contains the initial data
-v, --version print version
Help Options:
-h, --help Show this help messageStart the server by specifying the project name
$ ./bigquery-emulator --project=test
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060If you want to use docker image to start emulator, specify like the following.
$ docker run -it ghcr.io/Recidiviz/bigquery-emulator:latest --project=test$ ./bigquery-emulator --project=test --data-from-yaml=./server/testdata/data.yaml
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060server/testdata/data.yamlis here
$ bq --api http://0.0.0.0:9050 query --project_id=test "SELECT * FROM dataset1.table_a WHERE id = 1"
+----+-------+---------------------------------------------+------------+----------+---------------------+
| id | name | structarr | birthday | skillNum | created_at |
+----+-------+---------------------------------------------+------------+----------+---------------------+
| 1 | alice | [{"key":"profile","value":"{\"age\": 10}"}] | 2012-01-01 | 3 | 2022-01-01 12:00:00 |
+----+-------+---------------------------------------------+------------+----------+---------------------+For Python unit testing: See the comprehensive Python Testing Guide for using the emulator with testcontainers, pytest fixtures, and
unittest.TestCase.
$ ./bigquery-emulator --project=test --dataset=dataset1
[bigquery-emulator] REST server listening at 0.0.0.0:9050
[bigquery-emulator] gRPC server listening at 0.0.0.0:9060Create ClientOptions with api_endpoint option and use AnonymousCredentials to disable authentication.
from google.api_core.client_options import ClientOptions
from google.auth.credentials import AnonymousCredentials
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig
client_options = ClientOptions(api_endpoint="http://0.0.0.0:9050")
client = bigquery.Client(
"test",
client_options=client_options,
credentials=AnonymousCredentials(),
)
client.query(query="...", job_config=QueryJobConfig())If you use a DataFrame as the download destination for the query results,
You must either disable the BigQueryStorage client with create_bqstorage_client=False or
create a BigQueryStorage client that references the local grpc port (default 9060).
https://cloud.google.com/bigquery/docs/samples/bigquery-query-results-dataframe?hl=en
result = client.query(sql).to_dataframe(create_bqstorage_client=False)or
from google.cloud import bigquery_storage
client_options = ClientOptions(api_endpoint="0.0.0.0:9060")
read_client = bigquery_storage.BigQueryReadClient(client_options=client_options)
result = client.query(sql).to_dataframe(bqstorage_client=read_client)If you use the Go language as a BigQuery client, you can launch the BigQuery emulator on the same process as the testing process.
Import github.com/Recidiviz/bigquery-emulator/server ( and github.com/Recidiviz/bigquery-emulator/types ) and you can use server.New API to create the emulator server instance.
See the API reference for more information: https://pkg.go.dev/github.com/goccy/bigquery-emulator
package main
import (
"context"
"fmt"
"cloud.google.com/go/bigquery"
"github.com/Recidiviz/bigquery-emulator/server"
"github.com/Recidiviz/bigquery-emulator/types"
"google.golang.org/api/iterator"
"google.golang.org/api/option"
)
func main() {
ctx := context.Background()
const (
projectID = "test"
datasetID = "dataset1"
routineID = "routine1"
)
bqServer, err := server.New(server.TempStorage)
if err != nil {
panic(err)
}
if err := bqServer.Load(
server.StructSource(
types.NewProject(
projectID,
types.NewDataset(
datasetID,
),
),
),
); err != nil {
panic(err)
}
if err := bqServer.SetProject(projectID); err != nil {
panic(err)
}
testServer := bqServer.TestServer()
defer testServer.Close()
client, err := bigquery.NewClient(
ctx,
projectID,
option.WithEndpoint(testServer.URL),
option.WithoutAuthentication(),
)
if err != nil {
panic(err)
}
defer client.Close()
}If you have specified a database file when starting bigquery-emulator, you can check the status of the database by using the zetasqlite-cli tool. See here for details.
After receiving a query, go-zetasqlite parses and analyzes the input query using google/zetasql.
Query metadata objects are extracted from the AST, then transformed into a SQLite-compatible query.
The modernc.org/sqlite driver is then used to access the SQLite Database.
BigQuery has a number of types that do not exist in SQLite (e.g. ARRAY and STRUCT).
In order to handle them in SQLite, go-zetasqlite encodes all types except INT64 / FLOAT64 / BOOL with the type information and data combination.
When using the encoded data, the data is decoded via a custom function registered with driver before use.
Regarding the story of bigquery-emulator, there are the following articles.
- How to create a BigQuery Emulator ( Japanese )
MIT

