Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Databricks special files
.databricks/

# Python things
build/
dist/
__pycache__/
*.egg-info
.venv/
scratch/**
!scratch/README.md

# MacOS things
.DS_Store
**/.DS_Store

# VS Code
.vscode/
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Warehouse SLA Monitor DAB

The `Warehouse SLA Monitor` framework is packaged as a Databricks Asset Bundle (DAB), making it simple to deploy an SLA monitor to a target Databricks workspace.

## Getting started

1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html

2. Authenticate to your Databricks workspace, if you have not done so already:
```
$ databricks configure
```

3. To deploy a development copy of this project, type:
```
$ databricks bundle deploy --target dev
```

4. Similarly, to deploy a production copy, type:
```
$ databricks bundle deploy --target prod
```

Note that the default job from the template has a schedule that runs every day
(defined in resources/sla_monitor_job.yml). The schedule
is paused when deploying in development mode (see
https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).

5. To run a job or pipeline, use the "run" command and supply the notebook params for the Warehouse SLA poller using a comma-separated string:

- `workspace_name` - the name of the target Databricks workspace (without the 'https://`)
- `database_name` - the fully-qualified (catalog name) and database name to save the metrics tables to
- `policy_id` - the SLA policy identifier

```
$ databricks bundle run --notebook-params workspace_name=my-db-workspace.cloud.databricks.com,database_name=my_catalog.monitoring,policy_id=2
```

6. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
https://docs.databricks.com/dev-tools/vscode-ext.html.

7. For documentation on the Databricks asset bundles format used
for this project, and for CI/CD configuration, see
https://docs.databricks.com/dev-tools/bundles/index.html.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
bundle:
name: sla_monitor

include:
- resources/*.yml

workspace:
file_path: /Workspace/Users/${workspace.current_user.userName}/Warehouse_SLA_Monitoring/files

targets:
dev:
mode: development
default: true
workspace:
host: https://e2-demo-field-eng.cloud.databricks.com
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Databricks notebook source
from query_sla_manager import QueryHistoryAlertManager

# MAGIC %run ../src/query_sla_manager

# COMMAND ----------

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Databricks notebook source

# DBTITLE 1,Define Alert Stream Paramters
database_name = "main.query_alert_manager"
checkpoint_location = "/dbfs/tmp/query_alert_manager/checkpoints" ## Define you checkpoint location for your SLA alert
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# The main job for sla_monitor.
resources:
jobs:
sla_monitor_job:
name: Warehouse_SLA_Monitor_Job

trigger:
periodic:
interval: 1
unit: DAYS

tasks:
- task_key: Poll_Query_History
job_cluster_key: warehouse_sla_monitor_cluster
notebook_task:
notebook_path: /Workspace/Users/${workspace.current_user.userName}/Warehouse_SLA_Monitoring/files/src/warehouse_sla_poller

job_clusters:
- job_cluster_key: warehouse_sla_monitor_cluster
new_cluster:
spark_version: 15.4.x-scala2.12
node_type_id: i3.xlarge
autoscale:
min_workers: 1
max_workers: 2
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Databricks notebook source

from pyspark.sql import SparkSession
from datetime import time, date, timedelta, timezone, datetime
from typing import List
Expand All @@ -9,6 +11,7 @@
import pandas as pd
import time

# COMMAND ----------

class QueryHistoryAlertManager():

Expand Down Expand Up @@ -683,4 +686,6 @@ def poll_with_policy(self,

if self.num_cumulative_errors >= self.hard_fail_after_n_attempts:
raise(e)
break
break

# COMMAND ----------
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Databricks notebook source
dbutils.widgets.text("workspace_name", "my-workspace.cloud.databricks.com")
dbutils.widgets.text("database_name", "hive_metastore.main.query_alert_manager")
dbutils.widgets.text("policy_id", "1")

# COMMAND ----------

workspace_name = dbutils.widgets.get("workspace_name")
pat_token = dutils.secrets.get("my-secret-scope", "my-pat-token")
database_name = dbutils.widgets.get("database_name")
policy_id = dbutils.widgets.get("policy_id")

# COMMAND ----------

# MAGIC %run ./query_sla_manager

# COMMAND ----------

# Initialize an SLA Policy Manager
query_manager = QueryHistoryAlertManager(host=workspace_name,
dbx_token = pat_token,
database_name=database_name
)

# COMMAND ----------

# Polling Loop - Schedule in a Job for Each Policy
batch_results = (query_manager.poll_with_policy(
policy_id=1,
polling_frequency_seconds=30,
hard_fail_after_n_attempts=3,
start_over=False))

# COMMAND ----------