Skip to content

Commit 2351632

Browse files
EDU-1502: Adds bigQuery page
1 parent df0b4b1 commit 2351632

File tree

1 file changed

+102
-0
lines changed

1 file changed

+102
-0
lines changed

content/bigquery.textile

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: BigQuery rule
3+
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
4+
---
5+
6+
Stream events published to Ably directly into a table in BigQuery for analytical or archival purposes. Typical use cases include:
7+
8+
* Realtime analytics on message data.
9+
* Centralized storage for raw event data, enabling downstream processing.
10+
* Historical auditing of messages with at least one delivery guarantee.
11+
12+
<aside data-type='note'>
13+
<p>Ably's BigQuery integration rule for Firehose is in development status.</p>
14+
</aside>
15+
16+
h3(#create-rule). Create a BigQuery rule
17+
18+
Create a BigQuery rule using the Ably Dashboard or the Control API.
19+
20+
Before creating the rule in Ably, ensure the following:
21+
22+
* Create or select a BigQuery dataset in the Google Cloud Console.
23+
* Create a BigQuery table in that dataset:
24+
** Use the JSON schema provided below.
25+
** For large volumes of data, partition the table (recommended daily partitioning by ingestion time).
26+
* Create a GCP service account with the minimal required BigQuery permissions:
27+
** *@bigquery.tables.get@* to read table metadata.
28+
** *@bigquery.tables.updateData@* to insert records.
29+
* Add table-level access control to grant the service account permission on the specific table.
30+
* Generate and securely store the JSON key file for the service account. Ably requires this key file to authenticate and write data for your table.
31+
32+
33+
h4(#dashboard). Create a BigQuery rule in the Dashboard
34+
35+
* Log in to the Ably Dashboard and select the application from which you want to stream data.
36+
* Navigate to the *Integrations* tab.
37+
* Click *New Integration Rule*.
38+
* Select *Firehose*.
39+
* Choose *BigQuery* from the list of available Firehose integrations.
40+
* Configure the rule settings as described below.Then, click *Create*.
41+
42+
h3(#settings). BigQuery rule settings
43+
44+
|_. Section |_. Purpose |
45+
| *Source* | Defines the type of event(s) for delivery. |
46+
| *Channel Filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
47+
| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
48+
| *Service Account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
49+
| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
50+
| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
51+
52+
h4(#api-rule). Creating a BigQuery rule using the Control API
53+
54+
Follow a similar process to other Firehose rules. When calling the Control API, specify:
55+
56+
* *ruleType*: @bigquery@
57+
* The correct settings, for example the destination table or service account credentials.
58+
59+
See the Control API Rules endpoint documentation for examples of creating and managing Firehose rules.
60+
61+
h3(#schema). JSON Schema
62+
63+
Ably recommends creating your BigQuery table using the schema below, which separates standard message fields from the raw payload:
64+
65+
```[json]
66+
{
67+
“name”: “id”,
68+
“type”: “STRING”,
69+
“mode”: “REQUIRED”,
70+
“description”: “Unique ID assigned by Ably to this message. Can optionally be assigned by the client.”
71+
}
72+
```
73+
74+
Ably transports arbitrary message payloads (JSON, text, or binary). Storing data in a @BYTES@ column ensures all message content is captured. Use the *content_type* field to understand how to interpret the payload.
75+
76+
h3. Data insertion and semantics
77+
78+
* *Protocol:* Ably uses the BigQuery Storage Write API over gRPC.
79+
* *Delivery guarantee:* At-least-once. You may see duplicate messages in BigQuery under high-throughput or transient failure conditions. You can de-duplicate using the unique *id* in an ETL process or query logic.
80+
81+
h3(#queries). Direct queries
82+
83+
You can run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
84+
85+
```[sql]
86+
SELECT
87+
PARSE_JSON(CAST(data AS STRING)) AS parsed_payload
88+
FROM project_id.dataset_id.table_id
89+
WHERE channel = “my-channel”
90+
```
91+
92+
However, JSON parsing at query time can be expensive for large datasets.
93+
94+
h4(#etl). ETL (recommended)
95+
96+
For large-scale analytics, consider an ETL pipeline to move data from the Ably-managed table to a secondary table with a more specific schema:
97+
98+
* Convert data from raw @BYTES@/JSON into structured columns (for example, geospatial columns, numeric fields).
99+
* Write these transformed records into a new table optimized for your queries.
100+
* Use the unique *id* field to eliminate duplicates.
101+
* Use BigQuery scheduled queries or an external workflow to automate these steps periodically.
102+

0 commit comments

Comments
 (0)