Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions dlt/sources/sql_database/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,9 @@ def sql_database(
"full" (default): Data types will be reflected on top of "minimal". `dlt` will coerce the data into reflected types if necessary.
"full_with_precision": Sets precision and scale on supported data types (ie. decimal, text, binary). Creates big and regular integer types.

defer_table_reflect (Optional[bool]): Will connect and reflect table schema only when yielding data. Requires table_names to be explicitly passed.
Enable this option when running on Airflow and other orchestrators that create execution DAGs.
defer_table_reflect (Optional[bool]): Will connect and reflect table schema only when yielding data. Requires `table_names` to be explicitly passed.
Enable this option when running on Airflow and other orchestrators that create execution DAGs. When True, schema is decided during execution,
which may override `query_adapter_callback` modifications or `apply_hints`.

table_adapter_callback (Optional[TTableAdapter]): Receives each reflected table. May be used to modify the list of columns that will be selected.

Expand Down Expand Up @@ -207,8 +208,9 @@ def sql_table(
"full" (default): Data types will be reflected on top of "minimal". `dlt` will coerce the data into reflected types if necessary.
"full_with_precision": Sets precision and scale on supported data types (ie. decimal, text, binary). Creates big and regular integer types.

defer_table_reflect (Optional[bool]): Will connect and reflect table schema only when yielding data.
Enable this option when running on Airflow and other orchestrators that create execution DAGs.
defer_table_reflect (Optional[bool]): Will connect and reflect table schema only when yielding data. Requires `table_names` to be explicitly passed.
Enable this option when running on Airflow and other orchestrators that create execution DAGs. When True, schema is decided during execution,
which may override `query_adapter_callback` modifications or `apply_hints`.

table_adapter_callback (Optional[TTableAdapter]): Receives each reflected table. May be used to modify the list of columns that will be selected.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,39 @@ will create `sql_database` folder with the source code that you can import and u
print(info)

```
4. **Prefix table names using `apply_hints`**

4. **Configuring table and column selection in `config.toml`**
You can rename tables before loading them into the destination by applying the `apply_hints` method to each resource. This is useful for avoiding naming collisions or organizing data.

```py
import dlt
from dlt.sources.sql_database import sql_database

def load_prefixed_tables_from_database() -> None:

# Define the pipeline
pipeline = dlt.pipeline(
pipeline_name="rfam",
destination="duckdb",
dataset_name="rfam_data",
)

# Fetch specific tables from the database
source = sql_database(table_names=["family", "clan"])

# Prefix tables before loading to avoid collisions
source_system = "prefix" # Your desired prefix
for _resource_name, resource in source.resources.items():
resource.apply_hints(table_name=f"{source_system}__{resource.name}")

# Run the pipeline
load_info = pipeline.run(source)
print(load_info)

```
This renames the tables before insertion. For example, the table "family" will be loaded as "prefix__family".

5. **Configuring table and column selection in `config.toml`**

To manage table and column selections outside of your Python scripts, you can configure them directly in the `config.toml` file. This approach is especially beneficial when dealing with multiple tables or when you prefer to keep configuration separate from code.

Expand Down