Skip to content

Generate destination docs section from DestinationCapabilitiesContext object #3046

@sh-rp

Description

@sh-rp

TLDR

Every destination provides a DestinationCapabilitiesContext instance that tells dlt internally how to behave when normalizing data for and loading data to this destination. There is a lot of information in this object that we could use to automatically generate a config section for the website to explain to the user which features this destination supports. For example preferred_loader_file_format tells dlt, which file_format for loading data is used if none is explicitely selected, or supports_tz_aware_datetime tells dlt wether the destination supports datetime columns that store a timezone.

Steps

Unfortunately all our docs scripts are written in js at this moment, so we can't integrate these changes into docs/website/tools/preprocess_docs.js but need to create an additional python script which is run after this preprocess step. Ideally preprocess docs would be written in python, maybe we can do this to soonish.

  • Familiarize yourself with the docs build process 'docs/website/README.md' in the core repo. If you see any outdated info in there, please update the README file. You can check out the packages.json to see which scripts are run when the website is built and deployed. There are some tools that modify the markdown files to insert code snippets.
  • Create a new script website/tools/insert_destination_capabilities.py
  • When run, this script should inspect all files in the folder docs/website/docs_processed, these are the processed markdown files that are served as our docs at dlthub.com/docs or locally when you run npm start. If a marker named <!--@@@DLT_DESTINATION_CAPABILITIES <destination_name>--> is found, a new markdown table should be inserted here with information about this destination. See how something similar is done with <!--@@@DLT_SNIPPET <snippet_name>--> markers.
  • You can get the capabilities object of each destination like this (duckdb example):
from dlt.destinations import duckdb
caps = duckdb.capabilities()

This might only work if destination credentials are provided, so consider making the _raw_capabilities() method a public and static method and use that to get the destination capabilities of a destination.

  • Add a <!--@@@DLT_DESTINATION_CAPABILITIES <destination_name>--> marker on each destination page such as docs/website/docs/dlt-ecosystem/destinations/duckdb.md.
  • For a start render a table there that includes the following information for each destination:
    • preferred_loader_file_format
    • supported_loader_file_formats
    • preferred_staging_file_format
    • supported_staging_file_formats
    • has_case_sensitive_identifiers
    • supported_merge_strategies
    • supported_replace_strategies
    • supports_tz_aware_datetime
    • supports_naive_datetime
  • Add this new script in package.json right after every time we call tools/preprocess_docs.js

Example destination capabilities section:

Feature Value More
Default loader file format parquet (link to loader file format info in docs)
Supported loader file formats parquet, csv (link to loader file format info in docs) 
 Supports timezone in timestamps  True  (link to timezone information in docs)

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions