Skip to content

Commit a9b9a49

Browse files
authored
Docs: add FAQ (#1214)
* Add FAQ doc * Change quickstart overview example dialect * Add FAQ link to front page * Delete release notes page * Update tagline on front page and github README * Fix test/audit running in plan/apply
1 parent e1cb7ff commit a9b9a49

7 files changed

Lines changed: 163 additions & 6 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
![SQLMesh logo](sqlmesh.svg)
22

3-
SQLMesh is a DataOps framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
3+
SQLMesh is a data transformation framework that brings the benefits of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python.
44

55
For more information, check out the [website](https://sqlmesh.com) and [documentation](https://sqlmesh.readthedocs.io/en/stable/).
66

docs/faq.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# FAQ
2+
3+
## General
4+
5+
???+ question "What is SQLMesh?"
6+
SQLMesh is an open source data transformation framework that brings the best practices of DevOps to data teams. It enables data engineers, scientists, and analysts to efficiently run and deploy data transformations written in SQL or Python.
7+
8+
It is created and maintained by Tobiko Data, a company founded by data leaders from Airbnb, Apple, and Netflix.
9+
10+
Check out the [quickstart guide](./quick_start.md) to see it in action.
11+
12+
??? question "What is SQLMesh used for?"
13+
SQLMesh is used to manage and execute data transformations - the process of converting raw data into a form useful for making business decisions.
14+
15+
??? question "What problems does SQLMesh solve?"
16+
**Problem: organizing, maintaining, and changing data transformation code in SQL or Python**
17+
18+
Solutions:
19+
20+
- Identify dependencies among data transformation models and determine the order in which they should run
21+
- Run data audits and unit tests to prevent unintended side effects from code changes
22+
- Implement best practices from the DevOps paradigm, such as development environments and continuous integration/continuous development (CI/CD)
23+
- Execute transformations written in one SQL dialect on an engine/database that runs a different SQL dialect (SQL transpilation)
24+
25+
<br>
26+
27+
**Problem: understanding a complex set of data transformations**
28+
29+
Solutions:
30+
31+
- Determine and display the flow of data through data transformation models
32+
- Trace which columns in a table contribute to a column in another table (column-level lineage)
33+
34+
<br>
35+
36+
**Problem: inefficient, unnecessarily expensive data transformations**
37+
38+
Solutions:
39+
40+
- Understand the impacts of a code change on the codebase and underlying data tables *without running the code*
41+
- Efficiently deploy code changes by only running the transformations impacted by the changes
42+
- Safely promote transformations executed in a development environment to production so computations aren’t needlessly re-executed
43+
44+
<br>
45+
46+
**Problem: complex business requirements and data transformations**
47+
48+
Solutions:
49+
50+
- Easily and safely implement incremental data loading
51+
- Perform complex data transformations or operations with Python models (e.g., machine learning models, geocoding)
52+
53+
<br>
54+
55+
...and more!
56+
57+
??? question "What is semantic understanding of SQL?"
58+
Semantic understanding is the result of analyzing SQL code to determine what it does at a granular level. SQLMesh uses the free, open-source Python library [SQLGlot](https://github.com/tobymao/sqlglot) to parse the SQL code and build the semantic understanding.
59+
60+
Semantic understanding allows SQLMesh to do things like transpilation (executing one SQL dialect on an engine running another dialect) and protecting incremental loading queries from duplicating data.
61+
62+
## Getting started
63+
64+
??? question "How do I install SQLMesh?"
65+
SQLMesh is a Python library. After ensuring you have [an appropriate Python runtime](./prerequisites.md), install it [with `pip`](./installation.md).
66+
67+
??? question "How do I use SQLmesh?"
68+
SQLMesh has three interfaces: [command line](./reference/cli.md), [Jupyter or Databricks notebook](./reference/notebook.md), and graphical user interface.
69+
70+
The [quickstart guide](./quick_start.md) demonstrates an example project in each of the interfaces.
71+
72+
## Databases/Engines
73+
74+
??? question "What databases/engines does SQLMesh work with?"
75+
SQLMesh works with BigQuery, Databricks, DuckDB, PostgreSQL, GCP PostgreSQL, Redshift, Snowflake, and Spark. See [this page](./integrations/engines.md) for more information.
76+
77+
??? question "When would you use different databases for executing data transformations and storing state information?"
78+
SQLMesh requires storing information about projects and when their transformations were run. By default, it stores this information in the same database where the models run.
79+
80+
Unlike data transformations, storing state information requires database transactions. Some databases, like BigQuery, aren’t optimized for executing transactions, so storing state information in them can slow down your project. If this occurs, you can store state information in a different database, such as PostgreSQL, that executes transactions more efficiently.
81+
82+
## How is this different from dbt?
83+
84+
??? question "Terminology differences?"
85+
- dbt “materializations” are analogous to [`model kinds` in SQLMesh](./concepts/models/model_kinds.md)
86+
- dbt seeds are a [model kind in SQLMesh](./concepts/models/model_kinds.md#seed)
87+
- dbt’s “tests” are called [`audits` in SQLMesh](./concepts/audits.md) because they are auditing the contents of *data* that already exists. [SQLMesh `tests`](./concepts/tests.md) are equivalent to “unit tests” in software engineering - they evaluate the correctness of *code* based on known inputs and outputs.
88+
- `dbt build` is analogous to [`sqlmesh run`](./reference/cli.md#run)
89+
90+
??? question "Workflow differences?"
91+
**dbt workflow**
92+
93+
- Configure your project and set up one database connection target for each environment you will use during development
94+
- Create, configure, and modify models, seeds, tests, and other project components
95+
- Execute `dbt build` (or its constituent parts `dbt run`, `dbt seed`, etc.) to evaluate and test the project components
96+
- Execute `dbt build` (or its constituent parts `dbt run`, `dbt seed`, etc.) on a schedule to ingest and transform new data
97+
98+
**SQLMesh workflow**
99+
100+
- Configure your project and set up a project database (using DuckDB locally or a database connection)
101+
- Create, configure, and modify models, audits, tests, and other project components
102+
- Execute `sqlmesh plan [environment name]` to:
103+
- Generate a summary of the differences between your project files and the environment and whether each change is `breaking`. The `plan` includes a list of the actions needed to implement the changes and automatically runs the project's unit `test`s.
104+
- Optionally apply the plan to implement the actions and run the project's `audit`s.
105+
- Execute `sqlmesh run` on a schedule to ingest and transform new data
106+
107+
??? question "Differences in running models?"
108+
dbt projects are executed with the commands `dbt run` (models only) or `dbt build` (models, tests, snapshots).
109+
110+
In SQLMesh, the execution depends on whether the project’s contents have been modified since the last execution:
111+
112+
- If they have been modified, the `sqlmesh plan` command both:
113+
1. Generates a summary of the actions that will occur to implement the code changes and
114+
2. Prompts the user to "apply" the plan and execute those actions.
115+
- If they have not been modified, the [`sqlmesh run`](./reference/cli.md#run) command will evaluate the project models and run the audits. SQLMesh determines which project models should be executed based on their [`cron` configuration parameter](./concepts/models/overview.md#cron).
116+
117+
For example, if a model’s `cron` is `daily` then `sqlmesh run` will only execute the model once per day. If you issue `sqlmesh run` the first time on a day the model will execute; if you issue `sqlmesh run` again nothing will happen because the model shouldn’t be executed again until tomorrow.
118+
119+
??? question "Differences in state management?"
120+
**dbt**
121+
122+
By default, dbt runs/builds are independent and have no knowledge of previous runs/builds. This knowledge is called “state” (as in “the state of things”).
123+
124+
dbt has the ability to store/maintain state with the `state` selector method and the `defer` feature. dbt stores state information in `artifacts` like the manifest JSON file and reads the files at runtime.
125+
126+
The dbt documentation [“Caveats to state comparison” page](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) comments on those features: “The state: selection method is a powerful feature, with a lot of underlying complexity.”
127+
128+
**SQLMesh**
129+
130+
SQLMesh always maintains state about the project structure, contents, and past runs. State information enables powerful SQLMesh features like virtual data environments and easy incremental loads.
131+
132+
State information is stored by default - you do not need to take any action to maintain or to use it when executing models. As the dbt caveats page says, state information is powerful but complex. SQLMesh handles that complexity for you so you don't need to learn about or understand the underlying mechanics.
133+
134+
SQLMesh stores state information in database tables. By default, it stores this information in the same [database/connection where your project models run](./reference/configuration.md#gateways). You can specify a [different database/connection](./reference/configuration.md#state-connection) if you would prefer to store state information somewhere else.
135+
136+
SQLMesh adds information to the state tables via transactions, and some databases like BigQuery are not optimized to execute transactions. Changing the state connection to another database like PostgreSQL can alleviate performance issues you may encounter due to state transactions.

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# SQLMesh
22

3-
[SQLMesh](https://sqlmesh.com) is an [open source](https://github.com/TobikoData/sqlmesh) DataOps framework that brings the best practices of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python. It is created and maintained by [Tobiko Data](https://tobikodata.com/), a company founded by data leaders from Airbnb, Apple, and Netflix.
3+
[SQLMesh](https://sqlmesh.com) is an [open source](https://github.com/TobikoData/sqlmesh) data transformation framework that brings the best practices of DevOps to data teams. It enables data scientists, analysts, and engineers to efficiently run and deploy data transformations written in SQL or Python. It is created and maintained by [Tobiko Data](https://tobikodata.com/), a company founded by data leaders from Airbnb, Apple, and Netflix.
44

55
## Why SQLMesh?
66

@@ -71,5 +71,6 @@ SQLMesh was built on three core principles:
7171

7272
## Next steps
7373
* [Jump right in with the quickstart](quick_start.md)
74+
* [Check out the FAQ](faq.md)
7475
* [Learn more about SQLMesh concepts](concepts/overview.md)
7576
* [Join our Slack community](https://tobikodata.com/slack)

docs/quick_start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ SQLMesh project-level configuration parameters are specified in the `config.yaml
8989

9090
This example project uses the embedded DuckDB SQL engine, so its configuration specifies `duckdb` as the local gateway's connection and the `local` gateway as the default.
9191

92-
The command to run the scaffold generator **requires** a default SQL dialect for your models, which it places in the config `model_defaults` `dialect` key. In this example, we specified the `snowflake` SQL dialect as the default:
92+
The command to run the scaffold generator **requires** a default SQL dialect for your models, which it places in the config `model_defaults` `dialect` key. In this example, we specified the `duckdb` SQL dialect as the default:
9393

9494
```yaml linenums="1"
9595
gateways:
@@ -101,7 +101,7 @@ gateways:
101101
default_gateway: local
102102

103103
model_defaults:
104-
dialect: snowflake
104+
dialect: duckdb
105105
```
106106
107107
Learn more about SQLMesh project configuration [here](./reference/configuration.md).

docs/release_notes.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

docs/stylesheets/extra.css

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
.md-typeset .admonition.question,
2+
.md-typeset details.question{
3+
border-color: rgb(68, 138, 255);
4+
}
5+
6+
.md-typeset .question > .admonition-title,
7+
.md-typeset .question > summary {
8+
background-color: rgba(68, 138, 255, 0.1);
9+
10+
&::before {
11+
background-color: rgb(68, 138, 255);
12+
}
13+
14+
&::after {
15+
color: rgb(68, 138, 255);
16+
}
17+
}

mkdocs.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ nav:
5757
- Resources:
5858
- comparisons.md
5959
- development.md
60-
- release_notes.md
60+
- "FAQ": faq.md
6161
- API: _readthedocs/html/sqlmesh.html
6262
theme:
6363
name: material
@@ -94,4 +94,8 @@ markdown_extensions:
9494
- pymdownx.superfences
9595
- pymdownx.tabbed:
9696
alternate_style: true
97+
- admonition
98+
- pymdownx.details
99+
extra_css:
100+
- stylesheets/extra.css
97101
copyright: Tobiko Data Inc.

0 commit comments

Comments
 (0)