Skip to content

Commit c72f98a

Browse files
EDU-1502: Adds bigQuery page
1 parent 5652ad8 commit c72f98a

File tree

3 files changed

+114
-0
lines changed

3 files changed

+114
-0
lines changed

content/integrations/index.textile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ The following pre-built services can be configured:
3838
* "AMQP":/docs/integrations/streaming/amqp
3939
* "AWS SQS":/docs/integrations/streaming/sqs
4040
* "Apache Pulsar":/docs/integrations/streaming/pulsar
41+
* "Google BigQuery":/docs/integrations/streaming/bigquery
4142

4243
h2(#queues). Message queues
4344

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
title: Google BigQuery
3+
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
4+
---
5+
6+
Stream events published to Ably directly into a "table":https://cloud.google.com/bigquery/docs/tables in "BigQuery":https://cloud.google.com/bigquery for analytical or archival purposes. General use cases include:
7+
8+
* Realtime analytics on message data.
9+
* Centralized storage for raw event data, enabling downstream processing.
10+
* Historical auditing of messages.
11+
12+
To stream data from Ably into BigQuery, you need to create a BigQuery "rule":#rule.
13+
14+
<aside data-type='note'>
15+
<p>Ably's BigQuery integration for "Firehose":/docs/integrations/streaming is in alpha status.</p>
16+
</aside>
17+
18+
h2(#rule). Create a BigQuery rule
19+
20+
A rule defines what data gets sent, where it goes, and how it's authenticated. For example, you can improve query performance by configuring a rule to stream data from a specific channel and write them into a "partitioned":https://cloud.google.com/bigquery/docs/partitioned-tables table.
21+
22+
h3(#dashboard). Create a rule using the Ably dashboard
23+
24+
The following steps to create a BigQuery rule using the Ably dashboard:
25+
26+
* Log in to the "Ably dashboard":https://ably.com/accounts/any and select the application you want to stream data from.
27+
* Navigate to the *Integrations* tab.
28+
* Click *New integration rule*.
29+
* Select *Firehose*.
30+
* Choose *BigQuery* from the list of available Firehose integrations.
31+
* "Configure":#configure the rule settings. Then, click *Create*.
32+
33+
h3(#api-rule). Create a rule using the ABly Control API
34+
35+
The following steps to create a BigQuery rule using the Control API:
36+
37+
* Using the required "rules":/docs/control-api#examples-rules to specify the following parameters:
38+
** @ruleType@: Set this to "bigquery" to define the rule as a BigQuery integration.
39+
** destinationTable: Specify the BigQuery table where the data will be stored.
40+
** @serviceAccountCredentials@: Provide the necessary GCP service account JSON key to authenticate and authorize data insertion.
41+
** @channelFilter@ (optional): Use a regular expression to apply the rule to specific channels.
42+
** @format@ (optional): Define the data format based on how you want messages to be structured.
43+
* Make an HTTP request to the Control API to create the rule.
44+
45+
h2(#configure). Configure BigQuery
46+
47+
Using the Google Cloud "Console":https://cloud.google.com/bigquery/docs/bigquery-web-ui, configure the required BigQuery resources, permissions, and authentication to allow Ably to write data securely to BigQuery.
48+
49+
The following steps configure BigQuery using the Google Cloud Console:
50+
51+
* Create or select a *BigQuery dataset* in the Google Cloud Console.
52+
* Create a *BigQuery table* in that dataset.
53+
** Use the "JSON schema":#schema.
54+
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance.
55+
56+
The following steps set up permissions and authentication using the Google Cloud Console:
57+
58+
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create with the minimal required BigQuery permissions.
59+
* Grant the service account table-level access control to allow access to the specific table.
60+
** @bigquery.tables.get@: to read table metadata.
61+
** @bigquery.tables.updateData@: to insert records.
62+
* Generate and securely store the *JSON key file* for the service account.
63+
** Ably requires this key file to authenticate and write data to your table.
64+
65+
h3(#settings). BigQuery configuration options
66+
67+
The following explains the BigQuery configuration options:
68+
69+
|_. Section |_. Purpose |
70+
| *Source* | Defines the type of event(s) for delivery. |
71+
| *Channel filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
72+
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
73+
| *Service account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
74+
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
75+
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
76+
77+
h2(#schema). JSON Schema
78+
79+
To store and structure message data in BigQuery, you need a schema that defines the expected fields to help ensure consistency. The following is an example JSON schema for a BigQuery table:
80+
81+
```[json]
82+
{
83+
“name”: “id”,
84+
“type”: “STRING”,
85+
“mode”: “REQUIRED”,
86+
“description”: “Unique ID assigned by Ably to this message. Can optionally be assigned by the client.”
87+
}
88+
```
89+
90+
h2(#queries). Direct queries
91+
92+
In Ably-managed BigQuery tables, message payloads are stored in the data column as raw JSON. You can extract fields using the following query. The following example query converts the @data@ column from @BYTES@ to @STRING@, parses it into a JSON object, and filters results by their channel name:
93+
94+
```[sql]
95+
SELECT
96+
PARSE_JSON(CAST(data AS STRING)) AS parsed_payload
97+
FROM project_id.dataset_id.table_id
98+
WHERE channel = “my-channel”
99+
```
100+
101+
h2(#etl). Extract, Transform, Load (ETL)
102+
103+
ETL is recommended for large-scale analytics to structure, deduplicate, and optimize data for querying. Since parsing JSON at query time can be costly for large datasets, pre-process and store structured fields in a secondary table instead. Convert raw data (JSON or BYTES), remove duplicates, and write it into an optimized table for better performance:
104+
105+
* Convert data from raw (BYTES/JSON) into structured columns for example geospatial data fields or numeric data types, for detailed analysis.
106+
* Write transformed records to a new optimized table tailored for query performance.
107+
* Deduplicate records using the unique ID field to ensure data integrity.
108+
* Automate the process using BigQuery scheduled queries or an external workflow to run transformations at regular intervals.
109+

src/data/nav/platform.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,10 @@ export default {
139139
name: 'Pulsar',
140140
link: '/docs/integrations/streaming/pulsar',
141141
},
142+
{
143+
name: 'BigQuery',
144+
link: '/docs/integrations/streaming/bigquery',
145+
},
142146
],
143147
},
144148
{

0 commit comments

Comments
 (0)