BigQuery ML Account Fraud Detection Demo

This SQL script demonstrates a basic workflow for building, evaluating, and explaining a machine learning model for account fraud detection using BigQuery ML. The script uses data from an external Cloud Spanner table.

Overview

The script performs the following steps:

Data Cleansing and Preparation: Creates a new table (demo_dataset.account_activity_cleansed) by querying raw account activity data from a Cloud Spanner table (account_activity_raw) via an external connection (demo-spanner-conn). During this process, it:
- Casts data types for timestamp, transaction amount, successful login, and unusual activity columns.
- Adds a classifier column with random values between 0 and 1 to facilitate splitting the data into training, evaluation, and prediction sets.
Data Split Verification: Includes a query to check the distribution of data across the training (80%), evaluation (10%), and prediction (10%) sets based on the classifier column.
Model Training: Creates or replaces a BigQuery ML Logistic Regression model (demo_dataset.fraud_model) using the cleansed data. Key options used:
- model_type='LOGISTIC_REG': Specifies the model algorithm.
- auto_class_weights=TRUE: Helps handle class imbalance in the target variable (unusual_activity).
- data_split_method='NO_SPLIT': Informs BQML that the data is already split manually.
- input_label_cols=['unusual_activity']: Defines the target variable.
- The training uses only the data where classifier < 0.8.
Model Evaluation: Evaluates the trained model's performance using ML.EVALUATE on the evaluation dataset (classifier between 0.8 and 0.9).
Prediction and Explanation: Uses ML.EXPLAIN_PREDICT to predict unusual activity on the prediction dataset (classifier > 0.9) and provides feature attributions (explanations) for the predictions. The results are filtered to show only instances where the prediction differs from the actual label (misclassifications).
(Commented Out) Example Update: Includes a commented-out example UPDATE statement showing how to modify data in the cleansed table.

Prerequisites

Access to a Google Cloud project with BigQuery and Cloud Spanner APIs enabled.
A BigQuery dataset named demo_dataset.
A BigQuery connection named datasherlock.us-central1.demo-spanner-conn configured to access a Cloud Spanner instance.
A Cloud Spanner table named account_activity_raw within the connected Spanner database, containing the necessary columns (transaction_id, account_id, timestamp, location, device_type, ip_address, transaction_amount, transaction_type, successful_login, unusual_activity).

How to Use

Ensure Prerequisites: Verify that all prerequisites listed above are met. Pay close attention to the dataset name (demo_dataset) and the connection name (datasherlock.us-central1.demo-spanner-conn) used in the script; adjust them if your environment uses different names.
Execute the Script: Run the SQL commands sequentially in the BigQuery console, using the bq command-line tool, or through a BigQuery client library.

Expected Outputs

A BigQuery table named demo_dataset.account_activity_cleansed containing the prepared data.
Query results showing the distribution of data into training, evaluation, and prediction sets.
A BigQuery ML model named demo_dataset.fraud_model.
Query results from ML.EVALUATE showing model performance metrics (e.g., precision, recall, accuracy, f1-score, roc_auc).
Query results from ML.EXPLAIN_PREDICT showing predictions, actual labels, and feature attributions for misclassified instances in the prediction set.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
sample-data		sample-data
README.md		README.md
bqml_demo_account_fraud.sql		bqml_demo_account_fraud.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BigQuery ML Account Fraud Detection Demo

Overview

Prerequisites

How to Use

Expected Outputs

Disclaimer: This README was generated by Gemini Code Assist

About

Uh oh!

Languages

datasherlock/bqml-demo

Folders and files

Latest commit

History

Repository files navigation

BigQuery ML Account Fraud Detection Demo

Overview

Prerequisites

How to Use

Expected Outputs

Disclaimer: This README was generated by Gemini Code Assist

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages