Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNAINT-1516][DJM] Add dbt documentation #26994

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

MassyB
Copy link

@MassyB MassyB commented Jan 7, 2025

What does this PR do? What is the motivation?

DJM aims at supporting dbt pipelines. This PR updates our public documentation to allow our customer to send us dbt OpenLineage events.

I took inspiration from this #25196

Jira https://datadoghq.atlassian.net/browse/DNAINT-1516

Merge instructions

[ ] Please merge after reviewing

Copy link
Contributor

github-actions bot commented Jan 7, 2025

@MassyB MassyB marked this pull request as ready for review January 7, 2025 15:16
@MassyB MassyB requested a review from a team as a code owner January 7, 2025 15:16

**Optional**

1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules.
1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules.

I do not fully understand this sentence.
I suppose it's "By setting up ..."
Also by "establish", do you mean I can choose the logging level that way?

Would be great to include an example and briefly explain what the example does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it might be better to put it in the Troubleshooting section, like adding OPENLINEAGE_DBT_LOGGING=DEBUG to enable debug logging

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I agree. Putting it in the Troubleshooting section makes sense

@MassyB MassyB requested a review from a team as a code owner January 8, 2025 13:14
@github-actions github-actions bot added the Architecture Everything related to the Doc backend label Jan 8, 2025

**Optional**

1. Setup `OPENLINEAGE_DBT_LOGGING` environment variable, you can establish the logging level for the `openlineage.dbt` and its child modules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it might be better to put it in the Troubleshooting section, like adding OPENLINEAGE_DBT_LOGGING=DEBUG to enable debug logging


In Datadog, you can see the traces by using the following APM query:
```text
operation_name:*dbt*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@warrierr what do you think of pointing users directly to traces for now ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to ship this to customers yet without some minimal experience in the DJM UI. I think this sets an odd/inconsistent standard for customers so we should wait until the overview/details page is minimally functional for dbt.

@github-actions github-actions bot added the Images Images are added/removed with this PR label Jan 8, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that updating this file to the content below should make two rows of three icons of equal height each:

{{ $dot := . }}
<div class="dsm-containers">
  <div class="container cards-dd">
    <div class="row row-cols-1 row-cols-md-3 g-2 g-xl-3 justify-content-sm-center">

      <div class="col">
        <a class="card h-100" href="/data_jobs/emr">
          <div class="card-body text-center py-2 px-1">
            {{ partial "img.html" (dict "root" . "src" "integrations_logos/amazon_emr.png" "class" "img-fluid" "alt"
            "Amazon EMR" "width" "200") }}
          </div>
        </a>
      </div>

      <div class="col">
        <a class="card h-100" href="/data_jobs/databricks/">
          <div class="card-body text-center py-2 px-1">
            {{ partial "img.html" (dict "root" . "src" "integrations_logos/databricks.png" "class" "img-fluid" "alt" "Databricks" "width" "200") }}
          </div>
        </a>
      </div>

      <div class="col">
        <a class="card h-100" href="/data_jobs/dataproc/">
          <div class="card-body text-center py-2 px-1">
            {{ partial "img.html" (dict "root" . "src" "integrations_logos/google_cloud_dataproc.png" "class"
            "img-fluid" "alt" "GCP Dataproc" "width" "200") }}
          </div>
        </a>
      </div>

      <div class="col">
        <a class="card h-100" href="/data_jobs/kubernetes/">
          <div class="card-body text-center py-2 px-1">
            {{ partial "img.html" (dict "root" . "src" "integrations_logos/kubernetes.png" "class" "img-fluid" "alt"
            "Kubernetes" "width" "200") }}
          </div>
        </a>
      </div>

      <div class="col">
        <a class="card h-100" href="/data_jobs/airflow/">
          <div class="card-body text-center py-2 px-1">
            {{ partial "img.html" (dict "root" . "src" "integrations_logos/airflow.png" "class" "img-fluid" "alt" "Airflow" "width" "200") }}
          </div>
        </a>
      </div>

      <div class="col">
        <a class="card h-100" href="/data_jobs/dbt/">
          <div class="card-body text-center py-2 px-1">
            {{ partial "img.html" (dict "root" . "src" "integrations_logos/dbt.png" "class" "img-fluid" "alt" "dbt" "width" "140") }}
          </div>
        </a>
      </div>

    </div>
  </div>
</div>

tag: 'Documentation'
text: 'Data Jobs Monitoring'
---

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a "preview" callout like we do for Airflow


* Replace `DD_DATA_OBSERVABILITY_INTAKE` with `https://data-obs-intake.`{{< region-param key="dd_site" code="true" >}}.
* Replace `DD_API_KEY` with your valid [Datadog API key][5].
* Replace `NAMESPACE` if you want to use something other than the `default` namespace for job namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MassyB @paul-laffon-dd how should users think of namespace in the context of dbt? What "thing" in dbt do we recommend they name this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way is to use different namespaces for different envs (prod/dev/staging)

## Validation

In your setup, you can run the following `dbt-ol` command to see traces in Datadog.
For example, if you are using the [jaffle-shop][8] project:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, if you are using the [jaffle-shop][8] project:

I don't think we need to link to this example. I would just make the command generic to a project name to something like this

dbt-ol run --select <your_project_name>

Copy link
Author

@MassyB MassyB Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt-ol run --select <your_model_name>

```

The above consumes dbt [artifacts][9] and sends OpenLineage events **after** the job finishes.
If you want to receive events in realtime you can use the `--consume-structured-logs` of `dbt-ol`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we making this an option for customers, shouldn't we just have this option set in the command we tell customers to run? What's the downside of having them get the data in realtime?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the downside of having them get the data in realtime?

None. I'll only keep this option then.

dbt-ol --consume-structured-logs run --select orders
```

In Datadog, you can see the traces by using the following APM query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything above here isn't really validation, it is part of the main setup to be able to get the runs to show up in Datadog for your projects. Validation section should purely be where in the DD product to look to see if data is collected correctly.


## Validation

In your setup, you can run the following `dbt-ol` command to see traces in Datadog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should provide a setence about how dbt-ol is a wrapper around dbt and, it supports all of the standard dbt subcommands, and is safe to use as a substitution.


In Datadog, you can see the traces by using the following APM query:
```text
operation_name:*dbt*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to ship this to customers yet without some minimal experience in the DJM UI. I think this sets an odd/inconsistent standard for customers so we should wait until the overview/details page is minimally functional for dbt.

@MassyB MassyB force-pushed the massy.bourennani/dnaint-1516-dbt-ol-documentation branch from 142dcab to db587b8 Compare January 9, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture Everything related to the Doc backend Images Images are added/removed with this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants