You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SQLMesh is a next-generation SQL transformation platform. It provides you with powerful automation for versioning, backfilling, deployment, and testing — allowing you to focus on simply writing SQL.
4
4
5
5
SQLMesh is able to achieve all of this with minimal setup; there are no additional services or dependencies required to get started using SQLMesh other than a connection to your existing data warehouse or engine.
6
6
7
7
## Why SQLMesh?
8
8
9
-
One of the main advantages over other transformation frameworks is that SQLMesh does not categorize incrementality as an "advanced" use case that should be avoided unless absolutely necessary. While other frameworks default to full refresh compute, the default for SQLMesh is to optimize for incremental compute, i.e. computing one day or hour at a time. This allows SQLMesh to be faster and more scalable than other frameworks, allowing you to take advantage of the cost and time savings of incrementality.
9
+
One of the main advantages over other transformation frameworks is that SQLMesh does not categorize incrementality as an "advanced" use case that should be avoided unless absolutely necessary. While other frameworks default to full refresh compute, the default for SQLMesh is to optimize for incremental compute, i.e. computing one day or hour at a time. This allows SQLMesh to be faster and more scalable than other frameworks, allowing you to take advantage of the cost and time savings of incrementality.
10
10
11
-
SQLMesh also automates away complexity, so configuring models is no longer tricky due to complex macros that require understanding of the context for execution. Writing your data pipelines incrementally with SQLMesh not only saves you money and time, but keeps your systems maintainable, reliable, and accessible to all of your data practicioners.
11
+
SQLMesh also automates away complexity, so configuring models is no longer tricky due to complex macros that require understanding of the context for execution. Writing your data pipelines incrementally with SQLMesh not only saves you money and time, but keeps your systems maintainable, reliable, and accessible to all of your data practictioners.
12
12
13
13
### Reduced cost
14
-
As discussed above, incremental compute is significantly cheaper than full refresh compute.
14
+
Incremental compute is significantly cheaper than full refresh compute.
15
15
16
-
For example, if you have one year of history but only receive new data on a daily basis, only processing that new data is ~365x cheaper than reprocessing one year each day. As your data grows, it's possible that refreshing your tables may take longer than a day, which means you would never be able to catch up!
16
+
For example, if you have one year of history but only receive new data on a daily basis, just processing that new data is ~365x cheaper than reprocessing one year each day. As your data grows, it's possible that refreshing your tables may take longer than a day, which means you would never be able to catch up!
17
17
18
18
In addition, you may not be able to refresh particular tables all at once; they may need to be batched into smaller intervals. The cost of your data pipelines compound as more dependent pipelines are created. Therefore, writing your data pipelines incrementally as much as possible can result in exponential savings.
19
19
20
20
### Increased efficiency
21
21
SQLMesh safely reuses physical tables across isolated environments. Some databases, such as Snowflake, have [zero-copy cloning](https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#label-cloning-tables)— but this is a manual process, and not widely supported.
22
22
23
-
SQLMesh is able to automatically reuse tables regardless of which data warehouse or engine you're using. This is achieved by storing fingerprints of your models and by employing [views](https://en.wikipedia.org/wiki/View_(SQL)) like pointers to physical locations. Therefore, spinning up a new development environment is fast and cheap; only models with incompatible changes need to be materialized, once again saving time and money.
23
+
SQLMesh is able to automatically reuse tables regardless of which data warehouse or engine you're using. This is achieved by storing fingerprints of your models and by employing [views](https://en.wikipedia.org/wiki/View_(SQL)) like pointers to physical locations. Therefore, spinning up a new development environment is fast and cheap; only models with incompatible changes need to be materialized, saving time and money.
24
24
25
25
### Automation for everyone
26
-
Creating maintainable and scalable data pipelines is extremely difficult, and is a task usually reserved for data engineers. As your data grows, the need for incremental compute becomes mandatory due to the cost and time constaints.
26
+
Creating maintainable and scalable data pipelines is extremely difficult, and a task usually reserved for data engineers. As your data grows, the need for incremental compute becomes mandatory due to the cost and time constaints.
27
27
28
-
Incremental models have inherent state of which partitions have been computed. This makes managing the consistency and accuracy challenging (leaving no data leakages or gaps). Although a seasoned engineer may have the expertise or tooling to operate one of these tables, an analyst would not. In these organizations, analysts would either need to file a ticket and wait on data engineering resources, or bypass core data models by running their own custom jobs, which inevitably leads to an ungoverned data mess. SQLMesh democratizes the ability to write safe and scalable data pipelines to all data practitioners, regardless of technical ability.
28
+
Incremental models have inherent state of which partitions have been computed. This makes managing the consistency and accuracy challenging (leaving no data leakages or gaps).
29
+
30
+
Although a seasoned engineer may have the expertise or tooling to operate one of these tables, an analyst would not. In these organizations, analysts would either need to file a ticket and wait on data engineering resources, or bypass core data models by running their own custom jobs, which inevitably leads to an ungoverned data mess. SQLMesh democratizes the ability to write safe and scalable data pipelines to all data practitioners, regardless of technical ability.
29
31
30
32
### Complexity made simple
31
33
As more and more models and users depend on core tables, the complexity of making changes increases. You must ensure that all downstream data consumers are compatible and updated with any new changes.
@@ -35,7 +37,9 @@ Propagating a change throughout a complex graph of dependencies is difficult to
35
37
### Collaboration and integration
36
38
SQLMesh allows for data pipelines to be a collaborative experience. It both empowers less technical data users to contribute and enables them to collaborate with others who may be more familiar with data engineering. Development can be done in a fully isolated environment that can be accessed and validated by others.
37
39
38
-
SQLMesh provides information about changes and how they may affect your downstream consumers. This transparency, along with the ability to categorize changes, makes it more feasible for a less technically savvy user to make updates to core data pipelines. By integrating with our Continuous Integration/Continuous Delivery (CI/CD) flows, you can require approval for any changes before going to production, ensuring that the relevant data owners or experts can review and validate the changes.
40
+
SQLMesh provides information about changes and how they may affect your downstream consumers. This transparency, along with the ability to categorize changes, makes it more feasible for a less technically savvy user to make updates to core data pipelines.
41
+
42
+
By integrating with our Continuous Integration/Continuous Delivery (CI/CD) flows, you can require approval for any changes before going to production, ensuring that the relevant data owners or experts can review and validate the changes.
39
43
40
44
### Testing and reliability
41
45
SQLMesh supports both audits and tests. Although unit tests has been commonplace in the world of software engineering, they are relatively unknown in the data world. SQLMesh's data unit tests allow for stability and reliability, as data pipeline owners can ensure that changes to models don't change underlying logic. These tests can run quickly in CI, or locally without having to create full scale tables.
Copy file name to clipboardExpand all lines: docs/api/overview.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
-
# Overview
1
+
# API
2
2
3
-
SQLMesh can be used with a [cli](cli.md), [notebook](notebook.md), or directly through [Python](python.md). Each interface aims to have parity in both functionality and arguments.
3
+
SQLMesh can be used with a [CLI](cli.md), [notebook](notebook.md), or directly through [Python](python.md). Each interface aims to have parity in both functionality and arguments. The following is a list of available commands.
4
4
5
5
## plan
6
6
Plan is the main command of SQLMesh. It allows you to interactively create a migration plan, understand the downstream impact, and apply it. All changes to models and environments are materialized through plan.
7
7
8
-
Read more about [plan](/concepts/plans).
8
+
Read more about [plans](/concepts/plans).
9
9
10
10
## evaluate
11
11
Evaluate a model or snapshot (running its query against a DB/Engine). This method is used to test or iterate on models without side effects.
@@ -19,14 +19,18 @@ Given a SQL query, fetches a pandas dataframe.
19
19
## test
20
20
Runs all tests.
21
21
22
+
Read more about [testing](/guides/tests).
23
+
22
24
## audit
23
25
Runs all audits.
24
26
27
+
Read more about [auditing](/guides/audits).
28
+
25
29
## format
26
30
Formats all SQL model files in place.
27
31
28
32
## diff
29
-
Shows the diff between the local model and a model in an evironment.
33
+
Shows the diff between the local model and a model in an environment.
Copy file name to clipboardExpand all lines: docs/concepts/audits.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,10 @@
1
-
# Audits
2
-
Audits are one of the tools SQLMesh provides to validate your data. Along with tests, audits are a great way to ensure the quality of your data and to build trust in your data across your organization. A comprehensive suite of audits can identify data issues upstream, whether they are from your vendors or other teams. Audits also empower your data engineers and analysts to work with confidence by catching problems early as they work on new features or make updates to your models.
1
+
# Auditing
2
+
Audits are one of the tools SQLMesh provides to validate your models. Along with [tests](/concepts/tests), they are a great way to ensure the quality of your data and to build trust in it across your organization.
3
3
4
-
## What exactly are audits?
5
-
Audits are SQL queries that should not return any rows. In other words, they query for bad data, so returned rows would indicate something is wrong. In its simplest form, an audit is defined with the custom AUDIT expression along with a query as in the following example:
4
+
A comprehensive suite of audits can identify data issues upstream, whether they are from your vendors or other teams. Audits also empower your data engineers and analysts to work with confidence by catching problems early as they work on new features or make updates to your models.
5
+
6
+
## Example audit
7
+
In SQLMesh, audits are defined in `.sql` files in an `audit` directory in your SQLMesh project. Multiple audits can be defined in a single file, so you can organize them to your liking. Audits are SQL queries that should not return any rows; in other words, they query for bad data, so returned rows indicates that something is wrong. In its simplest form, an audit is defined with the custom AUDIT expression along with a query, as in the following example:
6
8
7
9
```sql
8
10
AUDIT (
@@ -15,15 +17,16 @@ WHERE ds BETWEEN @start_ds AND @end_ds AND
15
17
price IS NULL
16
18
```
17
19
18
-
In the example above, we defined an audit named `assert_item_price_is_not_null` on the model `sushi.items` to ensure that every sushi item has a price. If the query is in a different dialect than the rest of your project, you can specify it here as we did in the example, and SQLGlot will automatically understand how to execute the query. While the query can technically be on any model or even multiple models, the model specified in the audit definition tells SQLMesh when to run the audit during your pipeline's execution. If the query returns any records, it means there may be an issue that requires your attention.
20
+
In the example, we defined an audit named `assert_item_price_is_not_null` on the model `sushi.items`, ensuring that every sushi item has a price.
19
21
20
-
Audits are defined in `.sql` files in an `audit` directory in your SQLMesh project. Multiple audits can be defined in a single file, so you can organize them to your liking.
22
+
**Note:** If the query is in a different dialect than the rest of your project, you can specify it here as we did in the example, and SQLGlot will automatically understand how to execute the query.
21
23
22
-
## Running audits
24
+
While the query can technically be on any model or even multiple models, the model specified in the audit definition tells SQLMesh when to run the audit during your pipeline's execution. If the query returns any records, it means there is a potential issue requiring your attention.
23
25
26
+
## Run an audit
24
27
### The CLI audit command
25
28
26
-
You can execute audits with the `sqlmesh audit` command, as in the following example:
29
+
You can execute audits with the `sqlmesh audit` command as follows:
When you apply a plan, SQLMesh will automatically run each model's audits. By default, SQLMesh will halt the pipeline when an audit fails in order to prevent potentially invalid data from propagating further downstream. This behvavior can be changed for individual audits. Refer to [Non-blocking audits](#non-blocking-audits).
42
45
43
46
## Advanced usage
44
-
45
47
### Skipping audits
46
-
47
-
Audits can be skipped by setting the skip argument to true as in the following example:
48
+
Audits can be skipped by setting the `skip` argument to `true` as in the following example:
48
49
49
50
```sql
50
51
AUDIT (
@@ -58,8 +59,7 @@ WHERE ds BETWEEN @start_ds AND @end_ds AND
58
59
```
59
60
60
61
### Non-blocking audits
61
-
62
-
By default, audits that fail will stop the execution of the pipeline in order to prevent bad data from propagating further downstream. An audit can be configured to notify you when it fails without blocking the execution of the pipeline, as in the following example:
62
+
By default, audits that fail will stop the execution of the pipeline in order to prevent bad data from propagating further. An audit can be configured to notify you when it fails without blocking the execution of the pipeline, as in the following example:
Copy file name to clipboardExpand all lines: docs/concepts/configs.md
+10-14Lines changed: 10 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,9 @@
1
1
# Configs
2
-
Configs define settings for things like engines (eg. Snowflake or Spark), schedulers (eg. Airflow or Dagster), and the SQL dialect. The config file is defined in config.py in the root directory of your SQLMesh project.
2
+
Configs define settings for things like engines (such as Snowflake or Spark), schedulers (such as Airflow or Dagster), and the SQL dialect. The config file is defined in config.py in the root directory of your SQLMesh project.
3
3
4
4
## Settings
5
5
### connections
6
-
A dictionary of supported connection and their configurations. The key represents a unique connection name. If there is only one connection, its configuration can be provided directly omitting the dictionary.
6
+
A dictionary of supported connection and their configurations. The key represents a unique connection name. If there is only one connection, its configuration can be provided directly, omitting the dictionary.
7
7
8
8
```python
9
9
import duckdb
@@ -17,8 +17,7 @@ Config(
17
17
```
18
18
19
19
### scheduler
20
-
Identifies which scheduler backend to use. The scheduler backend is used for both storing metadata and executing [plans](/concepts/plans). By default, the `BuiltinSchedulerBackend` is used which uses the existing SQL engine to store metadata and has a simple scheduler. The `AirflowSchedulerBackend` should be used if you want to integrate with Airflow.
21
-
20
+
Identifies which scheduler backend to use. The scheduler backend is used both for storing metadata and executing [plans](/concepts/plans). By default, the `BuiltinSchedulerBackend` is used, which uses the existing SQL engine to store metadata and has a simple scheduler. The `AirflowSchedulerBackend` should be used if you want to integrate with Airflow.
22
21
23
22
```python
24
23
from sqlmesh.core.config import AirflowSchedulerConfig, Config
Notification targets are used to receive logging or updates as SQLMesh processes things. Notification targets can be used to implement things like integration with Github or Slack.
29
+
Used to receive logging or updates as SQLMesh processes things. Notification targets can be used to implement things such as integration with Github or Slack.
31
30
32
31
### dialect
33
32
The default sql dialect of model queries. Default: same as engine dialect. The dialect is used if a [model](/concepts/models) does not define a dialect. Note that this dialect only specifies what the model is written as. At runtime, model queries will be transpiled to the correct engine dialect.
34
33
35
34
### physical_schema
36
-
The default schema used to store materialized tables. By default this will store all physical tables managed by SQLMesh in the `sqlmesh` schema/db in your warehouse.
35
+
The default schema used to store materialized tables. By default, this will store all physical tables managed by SQLMesh in the `sqlmesh` schema/db in your warehouse.
37
36
38
37
### snapshot_ttl
39
-
Duration before unpromoted snapshots are removed. This is defined as a string with the default be `in 1 week`. Other [relative strings](https://dateparser.readthedocs.io/en/latest/) can be used liked`in 30 days`.
38
+
Duration before unpromoted snapshots are removed. This is defined as a string with the default `in 1 week`. Other [relative strings](https://dateparser.readthedocs.io/en/latest/) can be used, such as`in 30 days`.
40
39
41
40
### time_column_format
42
41
The default format to use for all model time columns. Defaults to %Y-%m-%d.
@@ -47,9 +46,8 @@ This time format uses python format codes. https://docs.python.org/3/library/dat
47
46
A list of users that can be used for approvals/notifications.
48
47
49
48
## Precedence
50
-
51
49
You can configure your project in multiple places, and SQLMesh will prioritize configurations according to
52
-
the following order. From least to greatest precedence:
50
+
the following order, from least to greatest precedence:
53
51
54
52
- A Config object defined in a config.py file at the root of your project:
55
53
@@ -90,11 +88,10 @@ local_config = Config(
90
88
... )
91
89
```
92
90
93
-
## Using Config
94
-
95
-
The most common way to configure your SQLMesh project is with a `config.py` module at the root of the
91
+
## Using Config objects
92
+
The most common way to configure your SQLMesh project is with a `config.py` module at the root of your
96
93
project. A SQLMesh Context will automatically look for Config objects there. You can have multiple
97
-
Config objects defined, and then tell Context which one to use. For example, you can have different
94
+
Config objects defined and tell Context which one to use. For example, you can have different
98
95
Configs for local and production environments, Airflow, and Model tests.
99
96
100
97
Example config.py:
@@ -127,4 +124,3 @@ To use a Config, pass in its variable name to Context.
127
124
128
125
```
129
126
130
-
For more information about the Config class and its parameters, see `sqlmesh.core.config.Config`.
0 commit comments