Skip to content
This repository was archived by the owner on Sep 25, 2025. It is now read-only.

Commit 2213060

Browse files
authored
DCV-3417 document airflow upload/download + dbt retry (#59)
* DCV-3417 document airflow upload/download + dbt retry * Add dbt-coves setup docs
1 parent a3d54ea commit 2213060

File tree

4 files changed

+236
-5
lines changed

4 files changed

+236
-5
lines changed

docs/_sidebar.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
- [Account Pre-reqs](/getting-started/Admin/create-account.md)
55
- [Configure Airflow](/getting-started/Admin/configure-airflow.md)
66
- [Configure Git Repository](/getting-started/Admin/configure-repository.md)
7+
- [Configure Git Repository Using dbt-coves](/getting-started/Admin/configure-repository-using-dbt-coves.md)
78
- [Creating Airflow Dags](/getting-started/Admin/creating-airflow-dags.md)
89
- [User Management](/getting-started/Admin/user-management.md)
910
- [Developer](/getting-started/developer/)
@@ -19,14 +20,15 @@
1920
- [Airflow - Sync Internal Airflow database](/how-tos/airflow/sync-database.md)
2021
- [Airflow - Trigger a DAG using Datasets](how-tos/airflow/api-triggered-dag.md)
2122
- [DAGs - Add Dag Documentation](/how-tos/airflow/create-dag-level-docs.md)
22-
- [DAGs - Calling External Python Scripts](/how-tos/airflow/external-python-dag.md)
23+
- [DAGs - Calling External Python Scripts](/how-tos/airflow/external-python-dag.md)
2324
- [DAGs - Dynamically Set Schedule](/how-tos/airflow/dynamically-set-schedule.md)
2425
- [DAGs - Generate DAGs from yml](/how-tos/airflow/generate-dags-from-yml.md)
25-
- [DAGs - Get Current Git Branch Name from a DAG Task](/how-tos/airflow/get-current-branch-name.md)
26+
- [DAGs - Get Current Git Branch Name from a DAG Task](/how-tos/airflow/get-current-branch-name.md)
2627
- [DAGS - Load from S3 to Snowflake](/how-tos/airflow/s3-to-snowflake.md)
2728
- [DAGs - Run ADF Pipelines](/how-tos/airflow/run-adf-pipeline.md)
2829
- [DAGs - Run Airbyte sync jobs](/how-tos/airflow/run-airbyte-sync-jobs.md)
2930
- [DAGs - Run dbt](/how-tos/airflow/run-dbt.md)
31+
- [DAGs - Retry dbt jobs](/how-tos/airflow/retry-dbt-tasks.md)
3032
- [DAGs - Run Databricks Notebooks](/how-tos/airflow/run-databricks-notebook.md)
3133
- [DAGs - Run Fivetran sync jobs](/how-tos/airflow/run-fivetran-sync-jobs.md)
3234
- [DAGs - Test DAGs](/how-tos/airflow/test-dags.md)
@@ -38,6 +40,7 @@
3840
- [Secrets - Datacoves Secrets Manager](/how-tos/airflow/use-datacoves-secrets-manager.md)
3941
- [Worker - Custom Worker Environment](/how-tos/airflow/customize-worker-environment.md)
4042
- [Worker - Request Memory and CPU](/how-tos/airflow/request-resources-on-workers.md)
43+
4144
- [Datacoves](/how-tos/datacoves/)
4245
- [Configure Connection Templates](/how-tos/datacoves/how_to_connection_template.md)
4346
- [Configure Datacoves Secret](/how-tos/datacoves/how_to_secrets.md)
@@ -129,7 +132,7 @@
129132
- [Datacoves Operators](/reference/airflow/datacoves-operator.md)
130133
- [Datacoves](/reference/datacoves/)
131134
- [Versioning](/reference/datacoves/versioning.md)
132-
- [VPC Deployment](/reference/datacoves/vpc-deployment.md)
135+
- [VPC Deployment](/reference/datacoves/vpc-deployment.md)
133136
- [Metrics & Logs](/reference/metrics-and-logs/)
134137
- [Grafana](/reference/metrics-and-logs/grafana.md)
135138
- [Security](/reference/security/)
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Initial Datacoves Repository Setup
2+
3+
## Introduction
4+
5+
Setting up a new data project requires careful consideration of tools, configurations, and best practices. Datacoves simplifies this process by providing a standardized, yet customizable setup through the `dbt-coves` library. This article explains how to initialize and maintain your Datacoves repository.
6+
7+
## Getting Started with dbt-coves Setup
8+
9+
The `dbt-coves setup` command generates a fully configured project environment tailored to your specific needs. This command creates a repository structure with all necessary components pre-configured according to data engineering best practices.
10+
11+
### Initial Setup Process
12+
13+
dbt-coves comes pre-installed in Datacoves, you only have to run:
14+
15+
```bash
16+
# Create a new Datacoves repository
17+
dbt-coves setup
18+
```
19+
20+
During the setup process, you'll be guided through a series of configuration questions that determine:
21+
22+
- Which data warehouse to use (Snowflake, BigQuery, Redshift, Databricks)
23+
- Which components to include in your stack (dbt, Airflow, dlt)
24+
- Project naming conventions
25+
- Repository structure preferences
26+
- CI/CD pipeline configurations
27+
- Testing and documentation settings
28+
29+
>[!NOTE] It is recommended that you commit the answers file in your repo for future updates (see below)
30+
31+
## What Gets Created
32+
33+
The `dbt-coves setup` command generates a comprehensive project structure that includes:
34+
35+
1. **dbt configuration**
36+
- Pre-configured dbt project with appropriate adapters
37+
- Custom macros tailored to your selected data warehouse
38+
- Template generators for consistent model creation
39+
40+
2. **Orchestration tools**
41+
- Airflow DAG templates (if selected)
42+
- Pipeline configurations
43+
44+
3. **Data loading**
45+
- dlt configurations for data ingestion (if selected)
46+
47+
4. **Quality control**
48+
- SQLFluff and YAMLlint configurations
49+
- dbt test frameworks
50+
- CI/CD workflows for GitHub Actions or GitLab CI
51+
52+
5. **Documentation**
53+
- README templates
54+
- Project structure documentation
55+
56+
## Customizing Your Setup
57+
58+
The setup process is highly flexible, allowing you to:
59+
60+
- Select only the components you need
61+
- Configure folder structures based on your preferences
62+
- Set up CI/CD pipelines appropriate for your workflow
63+
- Include specialized macros for your specific data warehouse
64+
65+
## Updating Your Repository
66+
67+
As your project evolves or as Datacoves releases template improvements, you can update your existing repository:
68+
69+
```bash
70+
# Update an existing Datacoves repository
71+
dbt-coves setup --update
72+
```
73+
74+
The update process:
75+
- Preserves your custom code and configurations
76+
- Updates template-managed files with the latest versions
77+
- Adds any new components you select (it will remove the components you selected at Setup time but didn't select at Update time)
78+
- Maintains backward compatibility where possible
79+
80+
>[!NOTE] When running an update, you will be prompted for the services you want to setup / update, if you saved the answers file from when you first ran set, your original choices pre-selected. If you unselect one of these, that content will be deleted
81+
82+
## Benefits for Data Teams
83+
84+
This approach to repository setup and maintenance offers several advantages:
85+
86+
1. **Reduced setup time** from days to minutes
87+
2. **Consistency** across projects and teams
88+
3. **Built-in best practices** for data modeling and CI/CD
89+
4. **Easy maintenance** through template updates
90+
5. **Standardized testing** and quality control
91+
92+
## Conclusion
93+
94+
The `dbt-coves setup` command streamlines the creation and maintenance of Datacoves repositories by providing a solid foundation that incorporates industry best practices. Whether you're starting a new data project or standardizing existing ones, this approach offers a scalable and maintainable solution for modern data stack implementation.
95+
96+
By leveraging this setup process, data teams can focus on delivering value through data transformations and insights rather than spending time on infrastructure configuration.
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Retry a dbt task
2+
3+
## Overview
4+
5+
Retrying failed dbt models is a common workflow requirement when working with data transformations. This guide explains how to implement dbt task retry functionality in Airflow using Datacoves' custom `datacoves_dbt` decorator.
6+
7+
## Prerequisites
8+
9+
- Datacoves version 3.4 or later
10+
- dbt API feature enabled in your environment (contact support for further assistance)
11+
12+
## How dbt Retries Work
13+
14+
The retry mechanism works by:
15+
16+
1. **Capturing results** of a dbt run including any failures
17+
2. **Storing these results** using the dbt API
18+
3. **Retrieving the previous run state** when a retry is initiated
19+
4. **Selectively running** only the failed models and their downstream dependencies
20+
21+
## Implementing dbt Retries
22+
23+
### Step 1: Configure the `datacoves_dbt` Decorator
24+
25+
When defining your task, enable the necessary parameters for retries:
26+
27+
```python
28+
@task.datacoves_dbt(
29+
connection_id="your_connection",
30+
dbt_api_enabled=True, # Enable dbt API functionality
31+
download_run_results=True, # Allow downloading previous run results
32+
)
33+
```
34+
35+
### Step 2: Add Conditional Logic for Retry
36+
37+
Implement logic in your task function to check for existing results and execute the appropriate dbt command:
38+
39+
```python
40+
@task.datacoves_dbt(
41+
connection_id="your_connection",
42+
dbt_api_enabled=True,
43+
download_run_results=True,
44+
)
45+
def dbt_build(expected_files: list = []):
46+
if expected_files:
47+
return "dbt build -s result:error+ --state logs"
48+
else:
49+
return "dbt build -s your_models+"
50+
```
51+
52+
### Step 3: Call the Task with Expected Files Parameter
53+
54+
```python
55+
dbt_build(expected_files=["run_results.json"])
56+
```
57+
58+
## Complete Example
59+
60+
Here's a complete DAG implementation:
61+
62+
```python
63+
"""
64+
## Retry dbt Example
65+
This DAG demonstrates how to retry a DAG that fails during a run
66+
"""
67+
68+
from airflow.decorators import dag, task
69+
from orchestrate.utils import datacoves_utils
70+
71+
72+
@dag(
73+
doc_md = __doc__,
74+
catchup = False,
75+
default_args=datacoves_utils.set_default_args(
76+
owner = "Your Name",
77+
owner_email = "your.email@example.com"
78+
),
79+
schedule = datacoves_utils.set_schedule("0 0 1 */12 *"),
80+
description="Sample DAG demonstrating how to run the dbt models that fail",
81+
tags=["dbt_retry"],
82+
)
83+
def retry_dbt_failures():
84+
@task.datacoves_dbt(
85+
connection_id="your_connection",
86+
dbt_api_enabled=True,
87+
download_run_results=True,
88+
)
89+
def dbt_build(expected_files: list = []):
90+
if expected_files:
91+
return "dbt build -s result:error+ --state logs"
92+
else:
93+
return "dbt build -s model_a+ model_b+"
94+
95+
dbt_build(expected_files=["run_results.json"])
96+
97+
retry_dbt_failures()
98+
```

docs/reference/airflow/datacoves-decorators.md

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ def my_bash_dag():
2727
@task.datacoves_bash
2828
def echo_hello_world() -> str:
2929
return "Hello World!"
30-
3130
dag = my_bash_dag()
3231
```
3332

33+
3434
### @task.datacoves_dbt
3535

3636
This custom decorator is an extension of the @task decorator and simplifies running dbt commands within Airflow.
@@ -43,7 +43,7 @@ This custom decorator is an extension of the @task decorator and simplifies runn
4343
- It runs dbt commands inside the dbt Project Root, not the Repository root.
4444

4545
**Params:**
46-
- `connection_id`: This is the [service connection](/how-tos/datacoves/how_to_service_connections.md) which is automatically added to airflow if you select `Airflow Connection` as the `Delivery Mode`.
46+
- `connection_id`: This is the [service connection](/how-tos/datacoves/how_to_service_connections.md) which is automatically added to airflow if you select `Airflow Connection` as the `Delivery Mode`.
4747
- `overrides`: Pass in a dictionary with override parameters such as warehouse, role, or database.
4848

4949
```python
@@ -58,6 +58,7 @@ dag = my_dbt_dag()
5858
```
5959

6060
Example with overrides.
61+
6162
```python
6263
def my_dbt_dag():
6364
@task.datacoves_dbt(
@@ -72,6 +73,39 @@ dag = my_dbt_dag()
7273
The examples above use the Airflow connection `main` which is added automatically from the Datacoves Service Connection
7374
![Service Connection](assets/service_connection_main.jpg)
7475

76+
#### Uploading and downloading dbt results
77+
78+
From Datacoves 3.4 onwards, the `datacoves_dbt` decorator allows users to upload and download dbt execution results and metadata to our `dbt API`
79+
80+
>[!NOTE] dbt-API is a feature that is not enabled by default. Please contact support for further assistance.
81+
82+
This is particularly useful for performing [dbt retries](/how-tos/airflow/retry-dbt-tasks.md).
83+
84+
85+
The new datacoves_dbt parameters are:
86+
87+
- `dbt_api_enabled` (Default: `False`): Whether your Environment includes a dbt API instance.
88+
- `download_static_artifacts` (Default: `True`): Whether user wants to download dbt static artifact files.
89+
- `upload_static_artifacts` (Default: `False`): Whether user wants to upload dbt static files.
90+
- `download_additional_files` (Default: `[]`): A list of extra paths the user wants to download.
91+
- `upload_additional_files` (Default: `[]`): A list of extra paths the user wants to upload.
92+
- `upload_tag` (Default: DAG `run_id`): The tag/label the files will be uploaded with.
93+
- `upload_run_results` (Default: `True`): Whether the `run_results.json` dbt file will be uploaded.
94+
- `download_run_results` (Default: `False`): Whether the `run_results.json` dbt file will be downloaded.
95+
- `upload_sources_json` (Default: `True`): Whether the `sources.json` dbt file will be uploaded.
96+
- `download_sources_json` (Default: `False`): Whether the `sources.json` dbt file will be downloaded.
97+
98+
>[!NOTE]
99+
>**Static Artifacts**
100+
>The static artifacts are important dbt-generated files that help with dbt's operations:
101+
>
102+
>- `target/graph_summary.json`: Contains a summary of the DAG structure of your dbt project.
103+
>- `target/graph.gpickle`: A serialized Python networkx graph object representing your dbt project's dependency graph.
104+
>- `target/partial_parse.msgpack`: Used by dbt to speed up subsequent runs by storing parsed information.
105+
>- `target/semantic_manifest.json`: Contains semantic information about your dbt project.
106+
>
107+
>These files are downloaded by default (when `download_static_artifacts=True`) and are tagged as "latest" when uploaded.
108+
75109
### @task.datacoves_airflow_db_sync
76110

77111
>[!NOTE] The following Airflow tables are synced by default: ab_permission, ab_role, ab_user, dag, dag_run, dag_tag, import_error, job, task_fail, task_instance.

0 commit comments

Comments
 (0)