-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNAINT-1516][DJM] Add dbt
documentation
#26994
base: master
Are you sure you want to change the base?
Conversation
Preview links (active after the
|
content/en/data_jobs/dbt.md
Outdated
|
||
**Optional** | ||
|
||
1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules. | |
1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules. |
I do not fully understand this sentence.
I suppose it's "By setting up ..."
Also by "establish", do you mean I can choose the logging level that way?
Would be great to include an example and briefly explain what the example does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it might be better to put it in the Troubleshooting section, like adding OPENLINEAGE_DBT_LOGGING=DEBUG
to enable debug logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I agree. Putting it in the Troubleshooting section makes sense
content/en/data_jobs/dbt.md
Outdated
|
||
**Optional** | ||
|
||
1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it might be better to put it in the Troubleshooting section, like adding OPENLINEAGE_DBT_LOGGING=DEBUG
to enable debug logging
|
||
In Datadog, you can see the traces by using the following APM query: | ||
```text | ||
operation_name:*dbt* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@warrierr what do you think of pointing users directly to traces for now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to ship this to customers yet without some minimal experience in the DJM UI. I think this sets an odd/inconsistent standard for customers so we should wait until the overview/details page is minimally functional for dbt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that updating this file to the content below should make two rows of three icons of equal height each:
{{ $dot := . }}
<div class="dsm-containers">
<div class="container cards-dd">
<div class="row row-cols-1 row-cols-md-3 g-2 g-xl-3 justify-content-sm-center">
<div class="col">
<a class="card h-100" href="/data_jobs/emr">
<div class="card-body text-center py-2 px-1">
{{ partial "img.html" (dict "root" . "src" "integrations_logos/amazon_emr.png" "class" "img-fluid" "alt"
"Amazon EMR" "width" "200") }}
</div>
</a>
</div>
<div class="col">
<a class="card h-100" href="/data_jobs/databricks/">
<div class="card-body text-center py-2 px-1">
{{ partial "img.html" (dict "root" . "src" "integrations_logos/databricks.png" "class" "img-fluid" "alt" "Databricks" "width" "200") }}
</div>
</a>
</div>
<div class="col">
<a class="card h-100" href="/data_jobs/dataproc/">
<div class="card-body text-center py-2 px-1">
{{ partial "img.html" (dict "root" . "src" "integrations_logos/google_cloud_dataproc.png" "class"
"img-fluid" "alt" "GCP Dataproc" "width" "200") }}
</div>
</a>
</div>
<div class="col">
<a class="card h-100" href="/data_jobs/kubernetes/">
<div class="card-body text-center py-2 px-1">
{{ partial "img.html" (dict "root" . "src" "integrations_logos/kubernetes.png" "class" "img-fluid" "alt"
"Kubernetes" "width" "200") }}
</div>
</a>
</div>
<div class="col">
<a class="card h-100" href="/data_jobs/airflow/">
<div class="card-body text-center py-2 px-1">
{{ partial "img.html" (dict "root" . "src" "integrations_logos/airflow.png" "class" "img-fluid" "alt" "Airflow" "width" "200") }}
</div>
</a>
</div>
<div class="col">
<a class="card h-100" href="/data_jobs/dbt/">
<div class="card-body text-center py-2 px-1">
{{ partial "img.html" (dict "root" . "src" "integrations_logos/dbt.png" "class" "img-fluid" "alt" "dbt" "width" "140") }}
</div>
</a>
</div>
</div>
</div>
</div>
Co-authored-by: Michael Cretzman <58786311+michaelcretzman@users.noreply.github.com>
tag: 'Documentation' | ||
text: 'Data Jobs Monitoring' | ||
--- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add a "preview" callout like we do for Airflow
|
||
* Replace `DD_DATA_OBSERVABILITY_INTAKE` with `https://data-obs-intake.`{{< region-param key="dd_site" code="true" >}}. | ||
* Replace `DD_API_KEY` with your valid [Datadog API key][5]. | ||
* Replace `NAMESPACE` if you want to use something other than the `default` namespace for job namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MassyB @paul-laffon-dd how should users think of namespace in the context of dbt? What "thing" in dbt do we recommend they name this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way is to use different namespaces for different envs (prod/dev/staging)
content/en/data_jobs/dbt.md
Outdated
## Validation | ||
|
||
In your setup, you can run the following `dbt-ol` command to see traces in Datadog. | ||
For example, if you are using the [jaffle-shop][8] project: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if you are using the [jaffle-shop][8] project: |
I don't think we need to link to this example. I would just make the command generic to a project name to something like this
dbt-ol run --select <your_project_name>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbt-ol run --select <your_model_name>
content/en/data_jobs/dbt.md
Outdated
``` | ||
|
||
The above consumes dbt [artifacts][9] and sends OpenLineage events **after** the job finishes. | ||
If you want to receive events in realtime you can use the `--consume-structured-logs` of `dbt-ol`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we making this an option for customers, shouldn't we just have this option set in the command we tell customers to run? What's the downside of having them get the data in realtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the downside of having them get the data in realtime?
None. I'll only keep this option then.
dbt-ol --consume-structured-logs run --select orders | ||
``` | ||
|
||
In Datadog, you can see the traces by using the following APM query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything above here isn't really validation, it is part of the main setup to be able to get the runs to show up in Datadog for your projects. Validation section should purely be where in the DD product to look to see if data is collected correctly.
content/en/data_jobs/dbt.md
Outdated
|
||
## Validation | ||
|
||
In your setup, you can run the following `dbt-ol` command to see traces in Datadog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a setence about how dbt-ol is a wrapper around dbt and, it supports all of the standard dbt subcommands, and is safe to use as a substitution.
|
||
In Datadog, you can see the traces by using the following APM query: | ||
```text | ||
operation_name:*dbt* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to ship this to customers yet without some minimal experience in the DJM UI. I think this sets an odd/inconsistent standard for customers so we should wait until the overview/details page is minimally functional for dbt.
142dcab
to
db587b8
Compare
What does this PR do? What is the motivation?
DJM aims at supporting dbt pipelines. This PR updates our public documentation to allow our customer to send us dbt OpenLineage events.
I took inspiration from this #25196
Jira https://datadoghq.atlassian.net/browse/DNAINT-1516
Merge instructions
[ ] Please merge after reviewing