Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions docs/docs/developers/build/connectors/connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,20 @@ Rill is continually evaluating additional OLAP engines that can be added. For a

</div>

## Table Formats
### Apache Iceberg

<div className="connector-icon-grid">
<ConnectorIcon
icon={<img src="/img/build/connectors/icons/Logo-Iceberg.svg" alt="Apache Iceberg" />}
header="Apache Iceberg"
content="Read Iceberg tables directly from object storage through compatible query engines."
link="/developers/build/connectors/data-source/iceberg"
linkLabel="Learn more"
referenceLink="iceberg"
/>
</div>

## Other Data Connectors
### External DuckDB
### Google Sheets
Expand Down
13 changes: 12 additions & 1 deletion docs/docs/developers/build/connectors/data-source/data-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,20 @@ Rill supports connecting your data to both [DuckDB](/developers/build/connectors
linkLabel="Learn more"
referenceLink="azure"
/>
</div>

## Table Formats
### Apache Iceberg


<div className="connector-icon-grid">
<ConnectorIcon
icon={<img src="/img/build/connectors/icons/Logo-Iceberg.svg" alt="Apache Iceberg" />}
header="Apache Iceberg"
content="Read Iceberg tables directly from object storage through compatible query engines."
link="/developers/build/connectors/data-source/iceberg"
linkLabel="Learn more"
referenceLink="iceberg"
/>
</div>

## Other Data Connectors
Expand Down
129 changes: 129 additions & 0 deletions docs/docs/developers/build/connectors/data-source/iceberg.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
title: Apache Iceberg
description: Read Iceberg tables from object storage
sidebar_label: Apache Iceberg
sidebar_position: 27
---

## Overview

[Apache Iceberg](https://iceberg.apache.org/) is an open table format for large analytic datasets. Rill supports reading Iceberg tables directly from object storage through compatible query engine integrations. Today, this is powered by DuckDB's native [Iceberg extension](https://duckdb.org/docs/extensions/iceberg/overview.html).

:::note Direct file access only
Rill reads Iceberg tables by scanning the table's metadata and data files directly from object storage. Catalog-based access (e.g., through a Hive Metastore, AWS Glue, or REST catalog) is not currently supported.
:::

## Storage Backends

Iceberg tables can be read from any of the following storage backends:

| Backend | URI format | Authentication |
|---|---|---|
| Amazon S3 | `s3://bucket/path/to/table` | Requires an [S3 connector](/developers/build/connectors/data-source/s3) |
| Google Cloud Storage | `gs://bucket/path/to/table` | Requires a [GCS connector](/developers/build/connectors/data-source/gcs) |
| Azure Blob Storage | `azure://container/path/to/table` | Requires an [Azure connector](/developers/build/connectors/data-source/azure) |
| Local filesystem | `/path/to/table` | No authentication needed |

For cloud storage backends, you must first configure the corresponding storage connector with valid credentials. Rill uses these credentials to authenticate when reading the Iceberg table files.

## Using the UI

1. Click **Add Data** in your Rill project
2. Select **Apache Iceberg** as the data source type
3. Choose your storage backend (S3, GCS, Azure, or Local)
4. Enter the path to your Iceberg table directory
5. Optionally configure advanced parameters (allow moved paths, snapshot version)
6. Enter a model name and click **Create**

For cloud storage backends, the UI will prompt you to set up the corresponding storage connector if one doesn't already exist.

## Manual Configuration

Create a model that uses DuckDB's `iceberg_scan()` function to read the table.

### Reading from S3

Create `models/iceberg_data.yaml`:

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: s3
materialize: true

sql: |
SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table')
```

### Reading from GCS

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: gcs
materialize: true

sql: |
SELECT *
FROM iceberg_scan('gs://my-bucket/path/to/iceberg_table')
```

### Reading from Azure

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: azure
materialize: true

sql: |
SELECT *
FROM iceberg_scan('azure://my-container/path/to/iceberg_table')
```

### Reading from local filesystem

```yaml
type: model
connector: duckdb
materialize: true

sql: |
SELECT *
FROM iceberg_scan('/path/to/iceberg_table')
```

## Optional Parameters

The `iceberg_scan()` function accepts additional parameters:

| Parameter | Type | Description |
|---|---|---|
| `allow_moved_paths` | boolean | Allow reading tables where data files have been moved from their original location. Defaults to `true` in the UI. |
| `version` | string | Read a specific Iceberg snapshot version instead of the latest. |

Example with optional parameters:

```sql
SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table',
allow_moved_paths = true,
version = '2')
```

## Deploy to Rill Cloud

Since Iceberg tables are read through DuckDB using your existing storage connector credentials, deploying to Rill Cloud follows the same process as the underlying storage connector:

- **S3**: Follow the [S3 deployment guide](/developers/build/connectors/data-source/s3#deploy-to-rill-cloud)
- **GCS**: Follow the [GCS deployment guide](/developers/build/connectors/data-source/gcs#deploy-to-rill-cloud)
- **Azure**: Follow the [Azure deployment guide](/developers/build/connectors/data-source/azure#deploy-to-rill-cloud)

Ensure your storage connector credentials are configured in your Rill Cloud project before deploying.

## Limitations

- **Direct file access only**: Rill reads Iceberg metadata and data files directly from storage. Catalog integrations (Hive Metastore, AWS Glue, REST catalog) are not supported.
- **DuckDB engine**: Iceberg support is currently provided through DuckDB's Iceberg extension. Additional engine support (e.g., ClickHouse) is planned.
- **Read-only**: Rill reads from Iceberg tables but does not write to them.
83 changes: 83 additions & 0 deletions docs/docs/reference/project-files/connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ Connector YAML files define how Rill connects to external data sources and OLAP
- [**Gemini**](#gemini) - Gemini connector for chat with your own API key
- [**Slack**](#slack) - Slack data

### _Table Formats_
- [**Iceberg**](#iceberg) - Apache Iceberg tables via DuckDB

### _Other_
- [**HTTPS**](#https) - Public files via HTTP/HTTPS
- [**Salesforce**](#salesforce) - Salesforce data
Expand Down Expand Up @@ -530,6 +533,86 @@ headers:
"Authorization": 'Bearer {{ .env.HTTPS_TOKEN }}' # HTTP headers to include in the request
```

## Iceberg

Apache Iceberg tables are read through DuckDB's `iceberg_scan()` function. Iceberg is not a standalone connector; instead, configure a model that uses DuckDB with `iceberg_scan()`. For cloud storage backends, a corresponding storage connector (S3, GCS, or Azure) must be configured with valid credentials. See the [Iceberg documentation](/developers/build/connectors/data-source/iceberg) for more details.


### `driver`

_[string]_ - Must be `duckdb`. Iceberg tables are read through DuckDB's native Iceberg extension. _(required)_

### `sql`

_[string]_ - SQL query using `iceberg_scan()` to read the Iceberg table. The function accepts the table path and optional parameters:
- `allow_moved_paths` (boolean): Allow reading tables where data files have been moved from their original location.
- `version` (string): Read a specific Iceberg snapshot version instead of the latest.


### `create_secrets_from_connectors`

_[string, array]_ - Storage connector name(s) to use for authentication when reading Iceberg tables from cloud storage (e.g., `s3`, `gcs`, `azure`).

### `materialize`

_[boolean]_ - Whether to materialize the model in the OLAP engine. Defaults to `true` for source models.

```yaml
# Example: Iceberg model reading from S3
type: model
connector: duckdb
create_secrets_from_connectors: s3
materialize: true
sql: |
SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table')
```

```yaml
# Example: Iceberg model reading from GCS
type: model
connector: duckdb
create_secrets_from_connectors: gcs
materialize: true
sql: |
SELECT *
FROM iceberg_scan('gs://my-bucket/path/to/iceberg_table')
```

```yaml
# Example: Iceberg model reading from Azure
type: model
connector: duckdb
create_secrets_from_connectors: azure
materialize: true
sql: |
SELECT *
FROM iceberg_scan('azure://my-container/path/to/iceberg_table')
```

```yaml
# Example: Iceberg model reading from local filesystem
type: model
connector: duckdb
materialize: true
sql: |
SELECT *
FROM iceberg_scan('/path/to/iceberg_table')
```

```yaml
# Example: Iceberg model with optional parameters
type: model
connector: duckdb
create_secrets_from_connectors: s3
materialize: true
sql: |
SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table',
allow_moved_paths = true,
version = '2')
```

## MotherDuck

### `driver`
Expand Down
Loading
Loading