Updating project structure, adding content for create a project guide. (#296)

technically-tracy · web-flow · commit 778cb703b12b · 2023-02-03T19:48:48.000-05:00
* Updating project structure as discussed, adding content for create a project guide.

* Updated per PR comment.
diff --git a/README.md b/README.md
@@ -1,45 +1,5 @@
-## What is SQLMesh?
+# What is SQLMesh?
 
 SQLMesh is a next-generation SQL transformation platform. It provides you with powerful automation for versioning, backfilling, deployment, and testing &mdash; allowing you to focus on simply writing SQL.
 
 SQLMesh is able to achieve all of this with minimal setup; there are no additional services or dependencies required to get started using SQLMesh other than a connection to your existing data warehouse or engine.
-
-## Why SQLMesh?
-
-One of the main advantages over other transformation frameworks is that SQLMesh does not categorize incrementality as an "advanced" use case that should be avoided unless absolutely necessary. While other frameworks default to full refresh compute, the default for SQLMesh is to optimize for incremental compute, i.e. computing one day or hour at a time. This allows SQLMesh to be faster and more scalable than other frameworks, allowing you to take advantage of the cost and time savings of incrementality.
-
-SQLMesh also automates away complexity, so configuring models is no longer tricky due to complex macros that require understanding of the context for execution. Writing your data pipelines incrementally with SQLMesh not only saves you money and time, but keeps your systems maintainable, reliable, and accessible to all of your data practictioners.
-
-### Reduced cost
-Incremental compute is significantly cheaper than full refresh compute.
-
-For example, if you have one year of history but only receive new data on a daily basis, just processing that new data is ~365x cheaper than reprocessing one year each day. As your data grows, it's possible that refreshing your tables may take longer than a day, which means you would never be able to catch up!
-
-In addition, you may not be able to refresh particular tables all at once; they may need to be batched into smaller intervals. The cost of your data pipelines compound as more dependent pipelines are created. Therefore, writing your data pipelines incrementally as much as possible can result in exponential savings.
-
-### Increased efficiency
-SQLMesh safely reuses physical tables across isolated environments. Some databases, such as Snowflake, have [zero-copy cloning](https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#label-cloning-tables) &mdash; but this is a manual process, and not widely supported.
-
-SQLMesh is able to automatically reuse tables regardless of which data warehouse or engine you're using. This is achieved by storing fingerprints of your models and by employing [views](https://en.wikipedia.org/wiki/View_(SQL)) like pointers to physical locations. Therefore, spinning up a new development environment is fast and cheap; only models with incompatible changes need to be materialized, saving time and money.
-
-### Automation for everyone
-Creating maintainable and scalable data pipelines is extremely difficult, and a task usually reserved for data engineers. As your data grows, the need for incremental compute becomes mandatory due to the cost and time constaints.
-
-Incremental models have inherent state of which partitions have been computed. This makes managing the consistency and accuracy challenging (leaving no data leakages or gaps). 
-
-Although a seasoned engineer may have the expertise or tooling to operate one of these tables, an analyst would not. In these organizations, analysts would either need to file a ticket and wait on data engineering resources, or bypass core data models by running their own custom jobs, which inevitably leads to an ungoverned data mess. SQLMesh democratizes the ability to write safe and scalable data pipelines to all data practitioners, regardless of technical ability.
-
-### Complexity made simple
-As more and more models and users depend on core tables, the complexity of making changes increases. You must ensure that all downstream data consumers are compatible and updated with any new changes.
-
-Propagating a change throughout a complex graph of dependencies is difficult to communicate, and also challenging to do accurately. The introduction of other schedulers such as [Airflow](https://airflow.apache.org/) adds even more complexity. SQLMesh seamlessly integrates directly with your existing scheduler so that your entire data pipeline, including jobs outside of SQLMesh, will be unified and robust.
-
-### Collaboration and integration
-SQLMesh allows for data pipelines to be a collaborative experience. It both empowers less technical data users to contribute and enables them to collaborate with others who may be more familiar with data engineering. Development can be done in a fully isolated environment that can be accessed and validated by others.
-
-SQLMesh provides information about changes and how they may affect your downstream consumers. This transparency, along with the ability to categorize changes, makes it more feasible for a less technically savvy user to make updates to core data pipelines. 
-
-By integrating with our Continuous Integration/Continuous Delivery (CI/CD) flows, you can require approval for any changes before going to production, ensuring that the relevant data owners or experts can review and validate the changes.
-
-### Testing and reliability
-SQLMesh supports both audits and tests. Although unit tests has been commonplace in the world of software engineering, they are relatively unknown in the data world. SQLMesh's data unit tests allow for stability and reliability, as data pipeline owners can ensure that changes to models don't change underlying logic. These tests can run quickly in CI, or locally without having to create full scale tables.
diff --git a/docs/best_practices/recommended_workflow.md b/docs/best_practices/recommended_workflow.md
@@ -0,0 +1,3 @@
+# Recommended workflow
+
+TODO
diff --git a/docs/concepts/architecture/serialization.md b/docs/concepts/architecture/serialization.md
@@ -0,0 +1,3 @@
+# Serialization
+
+TODO
diff --git a/docs/concepts/hooks.md b/docs/concepts/hooks.md
@@ -0,0 +1,3 @@
+# Hooks
+
+TODO
diff --git a/docs/concepts/models/python_models.md b/docs/concepts/models/python_models.md
@@ -0,0 +1,3 @@
+# Python models
+
+TODO
diff --git a/docs/concepts/models/seed_models.md b/docs/concepts/models/seed_models.md
@@ -0,0 +1,3 @@
+# Seed models
+
+TODO
diff --git a/docs/concepts/models/sql_models.md b/docs/concepts/models/sql_models.md
@@ -0,0 +1,3 @@
+# SQL models
+
+TODO
diff --git a/docs/concepts/team_development.md b/docs/concepts/team_development.md
@@ -0,0 +1,3 @@
+## Team development with SQLMesh
+
+TODO
diff --git a/docs/guides/create_a_project.md b/docs/guides/create_a_project.md
@@ -1,20 +1,75 @@
 # Create a project
 
-## Create a project from scratch
+## Create a project
 
-To get up and running with a new project that SQLMesh scaffolds for you, refer to the [quickstart guide](/quick_start).
+---
 
-* Make sure you have required dependencies.
-* Create a folder
-* Run `sqlmesh init`
+Before getting started, ensure that you meet the [prerequsities](../../prerequisites) for using SQLMesh.
+
+---
+
+To create a project from the command line, follow these steps:
+
+1. Create a directory for your project:
+
+    ```
+    mkdir my-project
+    ```
+
+2. Change directories into your new project:
+
+    ```
+    cd my-project
+    ```
+
+    From here, you can create your project structure from scratch, or SQLMesh can scaffold one for you. For the purposes of this guide, we'll show you how to scaffold your project so that you can get up and running quickly.
+
+1. To scaffold a project, it is recommended that you use a virtual environment by running the following commands:
+
+    ```
+    python -m venv .env
+    ```
+
+    ```
+    source .env/bin/active
+    ```
+
+    ```
+    pip install sqlmesh
+    ```
+
+    **Note:** When using a virtual environment, you must ensure that it is activated first. You should see `(.env)` in your command line; if you don't, run `source .env/bin/activate` from your project directory to activate your environment.
+
+1. Once you have activated your environment, run the following command and SQLMesh will build out your project:
+
+    ```
+    sqlmesh init
+    ```
+   
+    The following directories and files will be created that you can use to organize your SQLMesh project:
+
+    - config.py (database configuration file)
+    - ./models (SQL and Python models)
+    - ./audits (shared audits)
+    - ./tests (unit tests)
+    - ./macros
 
 ## Edit an existing project
 
-* There's nothing special to do for an existing project, you just use it as normal.
+To edit an existing project, open the project file you wish to edit in your preferred editor.
+
+If using CLI or Notebook, you can open a file in your project for editing by using the `sqlmesh` command with the `--path` varaible, and pointing to your project's path as follows:
+
+```
+sqlmesh --path <your-project-path>
+```
+
+For more details, refer to [CLI](../../api/cli) and [Notebook](../..api/notebook).
 
-* CLI/Notebook you need to pass the path variable in pointing to your project.
+## Import a dbt project
 
-## Import a DBT project
+To import a dbt project, use the `sqlmesh init` command with the `dbt` flag as follows:
 
-* Read the DBT guide
-* All you need to do is run sqlmesh with a dbt flag (ask chris for this).
+```
+sqlmesh init -t dbt
+```
diff --git a/docs/guides/process.md b/docs/guides/process.md
diff --git a/docs/integrations/github.md b/docs/integrations/github.md
@@ -0,0 +1,3 @@
+# GitHub
+
+TODO
diff --git a/docs/prerequisites.md b/docs/prerequisites.md
@@ -0,0 +1,16 @@
+# Prerequisites
+
+[//]: # (If anything changes here, update quick_start.md as well.)
+
+You'll need Python 3.7 or higher to use SQLMesh. You can check your python version by running the following command:
+```
+python3 --version
+```
+
+or:
+
+```
+python --version
+```
+
+**Note:** If `python --version` returns 2.x, replace all `python` commands with `python3`, and `pip` with `pip3`.
diff --git a/docs/quick_start.md b/docs/quick_start.md
@@ -5,6 +5,8 @@ This example project will run locally on your computer using [DuckDB](https://du
 
 ## Prerequisites
 
+[//]: # (If anything changes here, update prerequisites.md as well.)
+
 You'll need Python 3.7 or higher to use SQLMesh. You can check your python version by running the following command:
 ```
 python3 --version
diff --git a/docs/reference/cli.md b/docs/reference/cli.md
diff --git a/docs/reference/notebook.md b/docs/reference/notebook.md
diff --git a/docs/reference/overview.md b/docs/reference/overview.md
diff --git a/docs/reference/python.md b/docs/reference/python.md
diff --git a/docs/release_notes.md b/docs/release_notes.md
@@ -0,0 +1,3 @@
+# Release notes
+
+TODO
diff --git a/docs/why_sqlmesh.md b/docs/why_sqlmesh.md
@@ -0,0 +1,39 @@
+# Why SQLMesh?
+
+One of the main advantages over other transformation frameworks is that SQLMesh does not categorize incrementality as an "advanced" use case that should be avoided unless absolutely necessary. While other frameworks default to full refresh compute, the default for SQLMesh is to optimize for incremental compute, i.e. computing one day or hour at a time. This allows SQLMesh to be faster and more scalable than other frameworks, allowing you to take advantage of the cost and time savings of incrementality.
+
+SQLMesh also automates away complexity, so configuring models is no longer tricky due to complex macros that require understanding of the context for execution. Writing your data pipelines incrementally with SQLMesh not only saves you money and time, but keeps your systems maintainable, reliable, and accessible to all of your data practictioners.
+
+## Reduced cost
+Incremental compute is significantly cheaper than full refresh compute.
+
+For example, if you have one year of history but only receive new data on a daily basis, just processing that new data is ~365x cheaper than reprocessing one year each day. As your data grows, it's possible that refreshing your tables may take longer than a day, which means you would never be able to catch up!
+
+In addition, you may not be able to refresh particular tables all at once; they may need to be batched into smaller intervals. The cost of your data pipelines compound as more dependent pipelines are created. Therefore, writing your data pipelines incrementally as much as possible can result in exponential savings.
+
+## Increased efficiency
+SQLMesh safely reuses physical tables across isolated environments. Some databases, such as Snowflake, have [zero-copy cloning](https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#label-cloning-tables) &mdash; but this is a manual process, and not widely supported.
+
+SQLMesh is able to automatically reuse tables regardless of which data warehouse or engine you're using. This is achieved by storing fingerprints of your models and by employing [views](https://en.wikipedia.org/wiki/View_(SQL)) like pointers to physical locations. Therefore, spinning up a new development environment is fast and cheap; only models with incompatible changes need to be materialized, saving time and money.
+
+## Automation for everyone
+Creating maintainable and scalable data pipelines is extremely difficult, and a task usually reserved for data engineers. As your data grows, the need for incremental compute becomes mandatory due to the cost and time constaints.
+
+Incremental models have inherent state of which partitions have been computed. This makes managing the consistency and accuracy challenging (leaving no data leakages or gaps). 
+
+Although a seasoned engineer may have the expertise or tooling to operate one of these tables, an analyst would not. In these organizations, analysts would either need to file a ticket and wait on data engineering resources, or bypass core data models by running their own custom jobs, which inevitably leads to an ungoverned data mess. SQLMesh democratizes the ability to write safe and scalable data pipelines to all data practitioners, regardless of technical ability.
+
+## Complexity made simple
+As more and more models and users depend on core tables, the complexity of making changes increases. You must ensure that all downstream data consumers are compatible and updated with any new changes.
+
+Propagating a change throughout a complex graph of dependencies is difficult to communicate, and also challenging to do accurately. The introduction of other schedulers such as [Airflow](https://airflow.apache.org/) adds even more complexity. SQLMesh seamlessly integrates directly with your existing scheduler so that your entire data pipeline, including jobs outside of SQLMesh, will be unified and robust.
+
+## Collaboration and integration
+SQLMesh allows for data pipelines to be a collaborative experience. It both empowers less technical data users to contribute and enables them to collaborate with others who may be more familiar with data engineering. Development can be done in a fully isolated environment that can be accessed and validated by others.
+
+SQLMesh provides information about changes and how they may affect your downstream consumers. This transparency, along with the ability to categorize changes, makes it more feasible for a less technically savvy user to make updates to core data pipelines. 
+
+By integrating with our Continuous Integration/Continuous Delivery (CI/CD) flows, you can require approval for any changes before going to production, ensuring that the relevant data owners or experts can review and validate the changes.
+
+## Testing and reliability
+SQLMesh supports both audits and tests. Although unit tests has been commonplace in the world of software engineering, they are relatively unknown in the data world. SQLMesh's data unit tests allow for stability and reliability, as data pipeline owners can ensure that changes to models don't change underlying logic. These tests can run quickly in CI, or locally without having to create full scale tables.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,35 +1,47 @@
 site_name: SQLMesh
 nav:
-  - "Overview": index.md
-  - quick_start.md
+  - Get started:
+    - "What is SQLMesh?": index.md
+    - "Why SQLMesh?": why_sqlmesh.md
+    - prerequisites.md
+    - quick_start.md
   - Guides:
     - guides/create_a_project.md
-    - guides/process.md
   - Concepts:
     - concepts/overview.md
     - Models:
       - concepts/models/overview.md
       - concepts/models/model_kinds.md
+      - concepts/models/python_models.md
+      - concepts/models/sql_models.md
+      - concepts/models/seed_models.md
     - concepts/plans.md
     - concepts/configs.md
     - concepts/environments.md
+    - concepts/team_development.md
     - concepts/macros.md
+    - concepts/hooks.md
     - concepts/tests.md
     - concepts/audits.md
     - Architecture:
       - concepts/architecture/snapshots.md
+      - concepts/architecture/serialization.md
     - concepts/glossary.md
   - Reference:
-    - api/overview.md
+    - reference/overview.md
     - Options:
-      - api/cli.md
-      - api/notebook.md
-      - api/python.md
+      - reference/cli.md
+      - reference/notebook.md
+      - reference/python.md
   - Integrations:
     - integrations/overview.md
     - integrations/airflow.md
     - integrations/dbt.md
+    - integrations/github.md
+  - Best practices:
+    - best_practices/recommended_workflow.md
   - Resources:
+    - release_notes.md
     - community.md
     - development.md
 

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Recommended workflow`
	`2`	`+`
	`3`	`+TODO`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+## Team development with SQLMesh`
	`2`	`+`
	`3`	`+TODO`