This is a community based BigQuery Dremio connector made using the ARP framework. Check Dremio Hub for more examples and ARP Docs for documentation.
Dremio delivers lightning fast query speed and a self-service semantic layer operating directly against your data lake storage and other sources. No moving data to proprietary data warehouses or creating cubes, aggregation tables and BI extracts. Just flexibility and control for Data Architects, and self-service for Data Consumers.
- Join data from BigQuery with other sources (On prem/Cloud)
- Interactive SQL performance with Data Reflections
- Offload BigQuery tables using CTAS to your cheap data lake storage - HDFS, S3, ADLS
- Curate Datasets easily through the self-service platform
- Google Cloud Project ID
- Ex:
my-big-project-name
.
- Ex:
- Service Account Email & JSON Key
- You will need to generate an IAM service account with access to the BigQuery resources you want to query. You will need the contents of the JSON key for that account, as well as the email address associated with it.
- Download and install the Google Simba BigQuery JDBC driver from the Google website. Install the main JAR file into your local Maven repository with the following (update the path to match the download path):
mvn install:install-file \
-Dfile=/Users/build/Downloads/SimbaJDBCDriverforGoogleBigQuery42_1.2.11.1014/GoogleBigQueryJDBC42.jar \
-DgroupId=com.simba.googlebigquery \
-DartifactId=googlebigquery-jdbc42 \
-Dversion=1.2.11.1014 \
-Dpackaging=jar \
-DgeneratePom=true
- Generate a shaded BigQuery JDBC client JAR file by running
mvn clean install
inside thebigquery-driver-shade
directory. - In root directory with the pom.xml file run
mvn clean install -DskipTests
. - Take the resulting .jar file in the target folder and put it in the <DREMIO_HOME>\jars folder in Dremio.
- Restart Dremio
To debug pushdowns for queries set the following line in logback.xml
<logger name="com.dremio.exec.store.jdbc">
<level value="${dremio.log.level:-trace}"/>
</logger>
You can then notice lines like below in server.log file after which you can revist the YAML file to add pushdowns:
- 2019-07-11 18:56:24,001 [22d879a7-ce3d-f2ca-f380-005a88865700/0:foreman-planning] DEBUG c.d.e.store.jdbc.dialect.arp.ArpYaml - Operator / not supported. Aborting pushdown.
You can also take a look at the planning tab/visualized plan of the profile to determine if everything is pushed down or not.
- Go to the issue submission page: https://github.com/panoramichq/dremio-bigquery-connector/issues/new/choose. Please select an appropriate category and provide as much details as you can.
PRs are welcome. When submitting a PR make sure of the following:
- Try to follow Google's Java style coding when modifying/creating Java related content.
- Use a YAML linter to check the syntactic correctness of YAML file
- Make sure the build passes
- Run basic queries at least to ensure things are working properly