feat: implement join compatibility check for destinations

## Background

We need to determine whether two `dlt.Dataset` instances can be joined in a single SQL query. The current `is_same_physical_destination` (`dlt/dataset/dataset.py:426`) compares `str(config)` — which calls `str(credentials)`, rendering the full connection URL including username, port, and database:

```python
def is_same_physical_destination(dataset1, dataset2):
    return str(dataset1.destination_client.config) == str(dataset2.destination_client.config)
```

This is **too strict** for destinations that support cross-database joins (Snowflake, BigQuery, MSSQL, ClickHouse, Databricks, Dremio, Athena) — they return `False` even when tables can be joined across databases on the same server/account.

It is **semantically wrong** for the filesystem destination where joinability comes from the in-process DuckDB query engine, not data location.

A `fingerprint()` method exists on `DestinationClientConfiguration` that hashes a location identifier (e.g. host), but it is **not used** by `is_same_physical_destination` and is missing on several destinations (DuckDB, Athena, SQLAlchemy, Fabric).

## What to implement

### 1. Add `location()` to `DestinationClientConfiguration`

Returns the unhashed physical location string — the identity of where this destination lives. Replaces per-destination `fingerprint()` overrides. `fingerprint()` becomes derived on the base class and is no longer overridden per-destination:

```python
def location(self) -> str:
    """Returns the physical location of this destination (e.g. host, bucket URL, project id)"""
    return ""

def fingerprint(self) -> str:
    """Returns a hash of location()"""
    loc = self.location()
    return digest128(loc) if loc else ""
```

### 2. Add `can_join_with(other)` to `DestinationClientConfiguration`

Determines if tables from this destination can be joined with tables from `other` in a single query. This reflects the **dlt destination's current capability**, not the theoretical capability of the underlying database engine (e.g. DuckDB supports ATTACH for cross-database joins, but dlt doesn't implement it yet).

Default implementation compares locations. Destinations override for special semantics (e.g. filesystem, Postgres).

```python
def can_join_with(self, other: "DestinationClientConfiguration") -> bool:
    """Check if tables from this destination can be joined with tables from other in a single query"""
    self_loc = self.location()
    other_loc = other.location()
    if not self_loc or not other_loc:
        return False
    return self_loc == other_loc
```

### 3. Update `is_same_physical_destination` in `dlt/dataset/dataset.py`

Delegate to `can_join_with` on the config.

## Join compatibility per destination

`location()` is the physical identity of the destination. `can_join_with` determines joinability — it defaults to comparing locations but can be overridden.

| Destination | `location()` | `can_join_with` rule | Notes |
|---|---|---|---|
| **Postgres** | `host:port` | same `host:port/database` | No cross-DB joins, override needed |
| **Redshift** | `host:port` | same `host:port/database` | No cross-DB joins, override needed |
| **Snowflake** | `host` (account identifier) | same location (default) | Cross-DB within account |
| **BigQuery** | `project_id` | same location (default) | Cross-dataset within project |
| **MSSQL** | `host` | same location (default) | Cross-DB via `db.schema.table` |
| **Synapse** | `host` (inherited from MSSQL) | same location (default) | Cross-DB via `db.schema.table` |
| **ClickHouse** | `host` | same location (default) | Cross-DB via `db.table` |
| **Databricks** | `server_hostname` | same location (default) | Cross-catalog via Unity Catalog |
| **Athena** | `aws_data_catalog` | same location (default) | Cross-DB within Glue catalog |
| **Dremio** | `host` | same location (default) | Federation engine, cross-source |
| **DuckDB** | database file path or `":memory:"` | same location (default) | No ATTACH support in dlt currently |
| **MotherDuck** | access token (account) | same location (default) | Cross-DB within account |
| **Filesystem** | `scheme://netloc` or `""` for local | always `True` vs other filesystem, override needed | In-process DuckDB scans files directly |
| **DuckLake** | catalog connection string (no credentials) + `ducklake_name` | same location (default) | Catalog is the identity, not storage (paths are relative, bucket is swappable) |

### SQLAlchemy — dialect-dependent `can_join_with`

SQLAlchemy's `location()` is `host:port` (or file path for sqlite). `can_join_with` depends on the backend dialect:

| Backend | `can_join_with` rule |
|---|---|
| `postgresql` | same `host:port/database` (no cross-DB) |
| `mysql` | same `host:port` (cross-DB supported) |
| `mssql` | same `host:port` (cross-DB supported) |
| `oracle` | same `host:port` (cross-DB supported) |
| `db2` | same `host:port` (cross-DB supported) |
| `sqlite` | same database file path |
| unknown | same `host:port/database` (conservative default) |

## Missing `location()` implementations

These destinations currently have no `fingerprint()` override and return `""`:

| Destination | Needed `location()` |
|---|---|
| **DuckDB** | database file path or `":memory:"` |
| **Athena** | `aws_data_catalog` identifier |
| **SQLAlchemy** | `host:port` (or file path for sqlite) |
| **Fabric** | host/workspace identifier from credentials |

## Existing corrections needed

| Destination | Current `fingerprint()` | Problem | Fix |
|---|---|---|---|
| **DuckLake** | `digest128(storage bucket)` | Storage is swappable via `OVERRIDE_DATA_PATH`; catalog is the real identity | `location()` = catalog connection string (no credentials) + `ducklake_name` |

Destination	`location()`	`can_join_with` rule	Notes
Postgres	`host:port`	same `host:port/database`	No cross-DB joins, override needed
Redshift	`host:port`	same `host:port/database`	No cross-DB joins, override needed
Snowflake	`host` (account identifier)	same location (default)	Cross-DB within account
BigQuery	`project_id`	same location (default)	Cross-dataset within project
MSSQL	`host`	same location (default)	Cross-DB via `db.schema.table`
Synapse	`host` (inherited from MSSQL)	same location (default)	Cross-DB via `db.schema.table`
ClickHouse	`host`	same location (default)	Cross-DB via `db.table`
Databricks	`server_hostname`	same location (default)	Cross-catalog via Unity Catalog
Athena	`aws_data_catalog`	same location (default)	Cross-DB within Glue catalog
Dremio	`host`	same location (default)	Federation engine, cross-source
DuckDB	database file path or `":memory:"`	same location (default)	No ATTACH support in dlt currently
MotherDuck	access token (account)	same location (default)	Cross-DB within account
Filesystem	`scheme://netloc` or `""` for local	always `True` vs other filesystem, override needed	In-process DuckDB scans files directly
DuckLake	catalog connection string (no credentials) + `ducklake_name`	same location (default)	Catalog is the identity, not storage (paths are relative, bucket is swappable)

Backend	`can_join_with` rule
`postgresql`	same `host:port/database` (no cross-DB)
`mysql`	same `host:port` (cross-DB supported)
`mssql`	same `host:port` (cross-DB supported)
`oracle`	same `host:port` (cross-DB supported)
`db2`	same `host:port` (cross-DB supported)
`sqlite`	same database file path
unknown	same `host:port/database` (conservative default)

Destination	Needed `location()`
DuckDB	database file path or `":memory:"`
Athena	`aws_data_catalog` identifier
SQLAlchemy	`host:port` (or file path for sqlite)
Fabric	host/workspace identifier from credentials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement join compatibility check for destinations #3758

Background

What to implement

1. Add `location()` to `DestinationClientConfiguration`

2. Add `can_join_with(other)` to `DestinationClientConfiguration`

3. Update `is_same_physical_destination` in `dlt/dataset/dataset.py`

Join compatibility per destination

SQLAlchemy — dialect-dependent `can_join_with`

Missing `location()` implementations

Existing corrections needed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: implement join compatibility check for destinations #3758

Description

Background

What to implement

1. Add location() to DestinationClientConfiguration

2. Add can_join_with(other) to DestinationClientConfiguration

3. Update is_same_physical_destination in dlt/dataset/dataset.py

Join compatibility per destination

SQLAlchemy — dialect-dependent can_join_with

Missing location() implementations

Existing corrections needed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Add `location()` to `DestinationClientConfiguration`

2. Add `can_join_with(other)` to `DestinationClientConfiguration`

3. Update `is_same_physical_destination` in `dlt/dataset/dataset.py`

SQLAlchemy — dialect-dependent `can_join_with`

Missing `location()` implementations