EDU-1502: Adds bigQuery page

franrob-projects · franrob-projects · commit 2351632a3304 · 2025-02-18T19:11:13.000+01:00
diff --git a/content/bigquery.textile b/content/bigquery.textile
@@ -0,0 +1,102 @@
+---
+title: BigQuery rule
+meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently."
+---
+
+Stream events published to Ably directly into a table in BigQuery for analytical or archival purposes. Typical use cases include:
+
+* Realtime analytics on message data.
+* Centralized storage for raw event data, enabling downstream processing.
+* Historical auditing of messages with at least one delivery guarantee.
+
+<aside data-type='note'>
+<p>Ably's BigQuery integration rule for Firehose is in development status.</p>
+</aside>
+
+h3(#create-rule). Create a BigQuery rule
+
+Create a BigQuery rule using the Ably Dashboard or the Control API.
+
+Before creating the rule in Ably, ensure the following:
+
+* Create or select a BigQuery dataset in the Google Cloud Console.
+* Create a BigQuery table in that dataset:
+** Use the JSON schema provided below.
+** For large volumes of data, partition the table (recommended daily partitioning by ingestion time).
+* Create a GCP service account with the minimal required BigQuery permissions:
+** *@bigquery.tables.get@* to read table metadata.
+** *@bigquery.tables.updateData@* to insert records.
+* Add table-level access control to grant the service account permission on the specific table.
+* Generate and securely store the JSON key file for the service account. Ably requires this key file to authenticate and write data for your table.
+
+
+h4(#dashboard). Create a BigQuery rule in the Dashboard
+
+* Log in to the Ably Dashboard and select the application from which you want to stream data.
+* Navigate to the *Integrations* tab.
+* Click *New Integration Rule*.
+* Select *Firehose*.
+* Choose *BigQuery* from the list of available Firehose integrations.
+* Configure the rule settings as described below.Then, click *Create*.
+
+h3(#settings). BigQuery rule settings
+
+|_. Section |_. Purpose |
+| *Source* | Defines the type of event(s) for delivery. |
+| *Channel Filter* | A regular expression to filter which channels to capture. Only events on channels matching this regex are streamed into BigQuery. |
+| *Table* | The full destination table path in BigQuery, typically in the format @project_id.dataset_id.table_id@. |
+| *Service Account Key* | A JSON key file Ably uses to authenticate with Google Cloud. You must upload or provide the contents of this key file. |
+| *Partitioning* | _(Optional)_ The table must be created with the desired partitioning settings in BigQuery before making the rule in Ably. |
+| *Advanced settings* | Any additional configuration or custom fields relevant to your BigQuery setup (for future enhancements). |
+
+h4(#api-rule). Creating a BigQuery rule using the Control API
+
+Follow a similar process to other Firehose rules. When calling the Control API, specify:
+
+* *ruleType*: @bigquery@
+* The correct settings, for example the destination table or service account credentials.
+
+See the Control API Rules endpoint documentation for examples of creating and managing Firehose rules.
+
+h3(#schema). JSON Schema
+
+Ably recommends creating your BigQuery table using the schema below, which separates standard message fields from the raw payload:
+
+```[json]
+{
+“name”: “id”,
+“type”: “STRING”,
+“mode”: “REQUIRED”,
+“description”: “Unique ID assigned by Ably to this message. Can optionally be assigned by the client.”
+}
+```
+
+Ably transports arbitrary message payloads (JSON, text, or binary). Storing data in a @BYTES@ column ensures all message content is captured. Use the *content_type* field to understand how to interpret the payload.
+
+h3. Data insertion and semantics
+
+* *Protocol:* Ably uses the BigQuery Storage Write API over gRPC.
+* *Delivery guarantee:* At-least-once. You may see duplicate messages in BigQuery under high-throughput or transient failure conditions. You can de-duplicate using the unique *id* in an ETL process or query logic.
+
+h3(#queries). Direct queries
+
+You can run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@:
+
+```[sql]
+SELECT
+PARSE_JSON(CAST(data AS STRING)) AS parsed_payload
+FROM project_id.dataset_id.table_id
+WHERE channel = “my-channel”
+```
+
+However, JSON parsing at query time can be expensive for large datasets.
+
+h4(#etl). ETL (recommended)
+
+For large-scale analytics, consider an ETL pipeline to move data from the Ably-managed table to a secondary table with a more specific schema:
+
+* Convert data from raw @BYTES@/JSON into structured columns (for example, geospatial columns, numeric fields).
+* Write these transformed records into a new table optimized for your queries.
+* Use the unique *id* field to eliminate duplicates.
+* Use BigQuery scheduled queries or an external workflow to automate these steps periodically.
+