Skip to content

Commit 778cb70

Browse files
Updating project structure, adding content for create a project guide. (#296)
* Updating project structure as discussed, adding content for create a project guide. * Updated per PR comment.
1 parent 713a7f8 commit 778cb70

20 files changed

Lines changed: 169 additions & 89 deletions

README.md

Lines changed: 1 addition & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,5 @@
1-
## What is SQLMesh?
1+
# What is SQLMesh?
22

33
SQLMesh is a next-generation SQL transformation platform. It provides you with powerful automation for versioning, backfilling, deployment, and testing — allowing you to focus on simply writing SQL.
44

55
SQLMesh is able to achieve all of this with minimal setup; there are no additional services or dependencies required to get started using SQLMesh other than a connection to your existing data warehouse or engine.
6-
7-
## Why SQLMesh?
8-
9-
One of the main advantages over other transformation frameworks is that SQLMesh does not categorize incrementality as an "advanced" use case that should be avoided unless absolutely necessary. While other frameworks default to full refresh compute, the default for SQLMesh is to optimize for incremental compute, i.e. computing one day or hour at a time. This allows SQLMesh to be faster and more scalable than other frameworks, allowing you to take advantage of the cost and time savings of incrementality.
10-
11-
SQLMesh also automates away complexity, so configuring models is no longer tricky due to complex macros that require understanding of the context for execution. Writing your data pipelines incrementally with SQLMesh not only saves you money and time, but keeps your systems maintainable, reliable, and accessible to all of your data practictioners.
12-
13-
### Reduced cost
14-
Incremental compute is significantly cheaper than full refresh compute.
15-
16-
For example, if you have one year of history but only receive new data on a daily basis, just processing that new data is ~365x cheaper than reprocessing one year each day. As your data grows, it's possible that refreshing your tables may take longer than a day, which means you would never be able to catch up!
17-
18-
In addition, you may not be able to refresh particular tables all at once; they may need to be batched into smaller intervals. The cost of your data pipelines compound as more dependent pipelines are created. Therefore, writing your data pipelines incrementally as much as possible can result in exponential savings.
19-
20-
### Increased efficiency
21-
SQLMesh safely reuses physical tables across isolated environments. Some databases, such as Snowflake, have [zero-copy cloning](https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#label-cloning-tables) — but this is a manual process, and not widely supported.
22-
23-
SQLMesh is able to automatically reuse tables regardless of which data warehouse or engine you're using. This is achieved by storing fingerprints of your models and by employing [views](https://en.wikipedia.org/wiki/View_(SQL)) like pointers to physical locations. Therefore, spinning up a new development environment is fast and cheap; only models with incompatible changes need to be materialized, saving time and money.
24-
25-
### Automation for everyone
26-
Creating maintainable and scalable data pipelines is extremely difficult, and a task usually reserved for data engineers. As your data grows, the need for incremental compute becomes mandatory due to the cost and time constaints.
27-
28-
Incremental models have inherent state of which partitions have been computed. This makes managing the consistency and accuracy challenging (leaving no data leakages or gaps).
29-
30-
Although a seasoned engineer may have the expertise or tooling to operate one of these tables, an analyst would not. In these organizations, analysts would either need to file a ticket and wait on data engineering resources, or bypass core data models by running their own custom jobs, which inevitably leads to an ungoverned data mess. SQLMesh democratizes the ability to write safe and scalable data pipelines to all data practitioners, regardless of technical ability.
31-
32-
### Complexity made simple
33-
As more and more models and users depend on core tables, the complexity of making changes increases. You must ensure that all downstream data consumers are compatible and updated with any new changes.
34-
35-
Propagating a change throughout a complex graph of dependencies is difficult to communicate, and also challenging to do accurately. The introduction of other schedulers such as [Airflow](https://airflow.apache.org/) adds even more complexity. SQLMesh seamlessly integrates directly with your existing scheduler so that your entire data pipeline, including jobs outside of SQLMesh, will be unified and robust.
36-
37-
### Collaboration and integration
38-
SQLMesh allows for data pipelines to be a collaborative experience. It both empowers less technical data users to contribute and enables them to collaborate with others who may be more familiar with data engineering. Development can be done in a fully isolated environment that can be accessed and validated by others.
39-
40-
SQLMesh provides information about changes and how they may affect your downstream consumers. This transparency, along with the ability to categorize changes, makes it more feasible for a less technically savvy user to make updates to core data pipelines.
41-
42-
By integrating with our Continuous Integration/Continuous Delivery (CI/CD) flows, you can require approval for any changes before going to production, ensuring that the relevant data owners or experts can review and validate the changes.
43-
44-
### Testing and reliability
45-
SQLMesh supports both audits and tests. Although unit tests has been commonplace in the world of software engineering, they are relatively unknown in the data world. SQLMesh's data unit tests allow for stability and reliability, as data pipeline owners can ensure that changes to models don't change underlying logic. These tests can run quickly in CI, or locally without having to create full scale tables.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Recommended workflow
2+
3+
TODO
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Serialization
2+
3+
TODO

docs/concepts/hooks.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Hooks
2+
3+
TODO
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Python models
2+
3+
TODO
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Seed models
2+
3+
TODO

docs/concepts/models/sql_models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SQL models
2+
3+
TODO

docs/concepts/team_development.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## Team development with SQLMesh
2+
3+
TODO

docs/guides/create_a_project.md

Lines changed: 65 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,75 @@
11
# Create a project
22

3-
## Create a project from scratch
3+
## Create a project
44

5-
To get up and running with a new project that SQLMesh scaffolds for you, refer to the [quickstart guide](/quick_start).
5+
---
66

7-
* Make sure you have required dependencies.
8-
* Create a folder
9-
* Run `sqlmesh init`
7+
Before getting started, ensure that you meet the [prerequsities](../../prerequisites) for using SQLMesh.
8+
9+
---
10+
11+
To create a project from the command line, follow these steps:
12+
13+
1. Create a directory for your project:
14+
15+
```
16+
mkdir my-project
17+
```
18+
19+
2. Change directories into your new project:
20+
21+
```
22+
cd my-project
23+
```
24+
25+
From here, you can create your project structure from scratch, or SQLMesh can scaffold one for you. For the purposes of this guide, we'll show you how to scaffold your project so that you can get up and running quickly.
26+
27+
1. To scaffold a project, it is recommended that you use a virtual environment by running the following commands:
28+
29+
```
30+
python -m venv .env
31+
```
32+
33+
```
34+
source .env/bin/active
35+
```
36+
37+
```
38+
pip install sqlmesh
39+
```
40+
41+
**Note:** When using a virtual environment, you must ensure that it is activated first. You should see `(.env)` in your command line; if you don't, run `source .env/bin/activate` from your project directory to activate your environment.
42+
43+
1. Once you have activated your environment, run the following command and SQLMesh will build out your project:
44+
45+
```
46+
sqlmesh init
47+
```
48+
49+
The following directories and files will be created that you can use to organize your SQLMesh project:
50+
51+
- config.py (database configuration file)
52+
- ./models (SQL and Python models)
53+
- ./audits (shared audits)
54+
- ./tests (unit tests)
55+
- ./macros
1056
1157
## Edit an existing project
1258
13-
* There's nothing special to do for an existing project, you just use it as normal.
59+
To edit an existing project, open the project file you wish to edit in your preferred editor.
60+
61+
If using CLI or Notebook, you can open a file in your project for editing by using the `sqlmesh` command with the `--path` varaible, and pointing to your project's path as follows:
62+
63+
```
64+
sqlmesh --path <your-project-path>
65+
```
66+
67+
For more details, refer to [CLI](../../api/cli) and [Notebook](../..api/notebook).
1468
15-
* CLI/Notebook you need to pass the path variable in pointing to your project.
69+
## Import a dbt project
1670
17-
## Import a DBT project
71+
To import a dbt project, use the `sqlmesh init` command with the `dbt` flag as follows:
1872
19-
* Read the DBT guide
20-
* All you need to do is run sqlmesh with a dbt flag (ask chris for this).
73+
```
74+
sqlmesh init -t dbt
75+
```

docs/guides/process.md

Lines changed: 0 additions & 31 deletions
This file was deleted.

0 commit comments

Comments
 (0)