Skip to content

Add REST catalog support in docs (#1) #4031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/integrations/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ We are actively compiling this list of ClickHouse integrations below, so it's no
|RabbitMQ|<Rabbitmqsvg alt="RabbitMQ logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows ClickHouse to connect [RabbitMQ](https://www.rabbitmq.com/).|[Documentation](/engines/table-engines/integrations/rabbitmq)|
|Redis|<Redissvg alt="Redis logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows ClickHouse to use [Redis](https://redis.io/) as a dictionary source.|[Documentation](/sql-reference/dictionaries/index.md#redis)|
|Redpanda|<Image img={redpanda} alt="Redpanda logo" size="logo"/>|Data ingestion|Redpanda is the streaming data platform for developers. It's API-compatible with Apache Kafka, but 10x faster, much easier to use, and more cost effective|[Blog](https://redpanda.com/blog/real-time-olap-database-clickhouse-redpanda)|
|REST Catalog||Data ingestion|Integration with REST Catalog specification for Iceberg tables, supporting multiple catalog providers including Tabular.io.|[Documentation](/use-cases/data-lake/rest-catalog)|
|Rust|<Image img={rust} size="logo" alt="Rust logo"/>|Language client|A typed client for ClickHouse|[Documentation](/integrations/language-clients/rust.md)|
|SQLite|<Sqlitesvg alt="Sqlite logo" style={{width: '3rem', 'height': '3rem'}}/>|Data ingestion|Allows to import and export data to SQLite and supports queries to SQLite tables directly from ClickHouse.|[Documentation](/engines/table-engines/integrations/sqlite)|
|Superset|<Supersetsvg alt="Superset logo" style={{width: '3rem'}}/>|Data visualization|Explore and visualize your ClickHouse data with Apache Superset.|[Documentation](/integrations/data-visualization/superset-and-clickhouse.md)|
Expand Down
5 changes: 3 additions & 2 deletions docs/use-cases/data_lake/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@ pagination_prev: null
pagination_next: null
slug: /use-cases/data-lake
title: 'Data Lake'
keywords: ['data lake', 'glue', 'unity']
keywords: ['data lake', 'glue', 'unity', 'rest']
---

ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.).
ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.).

| Page | Description |
|-----|-----|
| [Querying data in S3 using ClickHouse and the Glue Data Catalog](/use-cases/data-lake/glue-catalog) | Query your data in S3 buckets using ClickHouse and the Glue Data Catalog. |
| [Querying data in S3 using ClickHouse and the Unity Data Catalog](/use-cases/data-lake/unity-catalog) | Query your using the Unity Catalog. |
| [Querying data in S3 using ClickHouse and the REST Catalog](/use-cases/data-lake/rest-catalog) | Query your data using the REST Catalog (Tabular.io). |
193 changes: 193 additions & 0 deletions docs/use-cases/data_lake/rest_catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
---
slug: /use-cases/data-lake/rest-catalog
sidebar_label: 'REST Catalog'
title: 'REST Catalog'
pagination_prev: null
pagination_next: null
description: 'In this guide, we will walk you through the steps to query
your data in S3 buckets using ClickHouse and the REST Catalog.'
Comment on lines +7 to +8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this description, no S3 buckets involved in this guide.

keywords: ['REST', 'Tabular', 'Data Lake', 'Iceberg']
show_related_blogs: true
---

import ExperimentalBadge from '@theme/badges/ExperimentalBadge';

<ExperimentalBadge/>

:::note
Integration with the REST Catalog works with Iceberg tables only.
This integration supports both AWS S3 and other cloud storage providers.
:::

ClickHouse supports integration with multiple catalogs (Unity, Glue, REST, Polaris, etc.). This guide will walk you through the steps to query your data using ClickHouse and the [REST Catalog](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml/) specification.

The REST Catalog is a standardized API specification for Iceberg catalogs, supported by various platforms including:
- **Local development environments** (using docker-compose setups)
- **Managed services** like Tabular.io
- **Self-hosted** REST catalog implementations

:::note
As this feature is experimental, you will need to enable it using:
`SET allow_experimental_database_rest_catalog = 1;`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`SET allow_experimental_database_rest_catalog = 1;`
`SET allow_experimental_database_iceberg = 1;`

:::

## Local Development Setup {#local-development-setup}

For local development and testing, you can use a containerized REST catalog setup. This approach is ideal for learning, prototyping, and development environments.

### Prerequisites {#local-prerequisites}

1. **Docker and Docker Compose**: Ensure Docker is installed and running
2. **Sample Setup**: You can use various docker-compose setups (see Alternative Docker Images below)

### Setting up Local REST Catalog {#setting-up-local-rest-catalog}

You can use various containerized REST catalog implementations such as **[Databricks docker-spark-iceberg](https://github.com/databricks/docker-spark-iceberg/blob/main/docker-compose.yml?ref=blog.min.io)** which provides a complete Spark + Iceberg + REST catalog environment with docker-compose, making it ideal for testing Iceberg integrations.

You'll need to add ClickHouse as a dependency in your docker-compose setup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write here some short set up steps, something like:

Create a new folder in which to run the example, then create a file docker-compose.yml with the configuration from Databricks docker-spark-iceberg.

Next, create a file docker-compose.override.yml and place the following ClickHouse container configuration into it

(After the code block we can say to run docker compose up)


```yaml
clickhouse:
image: clickhouse/clickhouse-server:main
container_name: clickhouse
user: '0:0' # Ensures root permissions
networks:
iceberg_net:
Comment on lines +54 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
networks:
iceberg_net:

With this line I get an error. Works without it.

ports:
- "8123:8123"
- "9002:9000"
volumes:
- ./clickhouse:/var/lib/clickhouse
- ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder
networks:
- iceberg_net
environment:
- CLICKHOUSE_DB=default
- CLICKHOUSE_USER=default
- CLICKHOUSE_DO_NOT_CHOWN=1
- CLICKHOUSE_PASSWORD=
Comment on lines +50 to +68
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clickhouse:
image: clickhouse/clickhouse-server:main
container_name: clickhouse
user: '0:0' # Ensures root permissions
networks:
iceberg_net:
ports:
- "8123:8123"
- "9002:9000"
volumes:
- ./clickhouse:/var/lib/clickhouse
- ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder
networks:
- iceberg_net
environment:
- CLICKHOUSE_DB=default
- CLICKHOUSE_USER=default
- CLICKHOUSE_DO_NOT_CHOWN=1
- CLICKHOUSE_PASSWORD=
clickhouse:
image: clickhouse/clickhouse-server:25.5.6
container_name: clickhouse
user: '0:0' # Ensures root permissions
networks:
iceberg_net:
ports:
- "8123:8123"
- "9002:9000"
volumes:
- ./clickhouse:/var/lib/clickhouse
- ./clickhouse/data_import:/var/lib/clickhouse/data_import # Mount dataset folder
networks:
- iceberg_net
environment:
- CLICKHOUSE_DB=default
- CLICKHOUSE_USER=default
- CLICKHOUSE_DO_NOT_CHOWN=1
- CLICKHOUSE_PASSWORD=

```

### Connecting to Local REST Catalog {#connecting-to-local-rest-catalog}

Connect to your ClickHouse container:

```bash
docker exec -it clickhouse clickhouse-client
```

Then create the database connection to the REST catalog:

```sql
CREATE DATABASE demo
ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password')
SETTINGS
catalog_type = 'rest',
storage_endpoint = 'http://minio:9000/lakehouse',
warehouse = 'demo'
Comment on lines +82 to +87
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CREATE DATABASE demo
ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password')
SETTINGS
catalog_type = 'rest',
storage_endpoint = 'http://minio:9000/lakehouse',
warehouse = 'demo'
SET allow_experimental_database_iceberg = 1;
CREATE DATABASE demo
ENGINE = DataLakeCatalog('http://rest:8181/v1', 'admin', 'password')
SETTINGS
catalog_type = 'rest',
storage_endpoint = 'http://minio:9000/lakehouse',
warehouse = 'demo'

```

## Querying REST catalog tables using ClickHouse {#querying-rest-catalog-tables-using-clickhouse}

Now that the connection is in place, you can start querying via the REST catalog. For example:

```sql
USE demo;

SHOW TABLES;
```

```sql title="Response"
┌─name──────────┐
│ default.taxis │
└───────────────┘
Comment on lines +101 to +103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unfortunately don't get this when I try to run the steps. I'm getting back:

SHOW TABLES IN demo

Query id: 4411372a-a71c-44e9-b27b-146af2048670

Ok.

0 rows in set. Elapsed: 0.047 sec.

demo is however created:

SHOW DATABASES

Query id: 70f26176-08cd-4e5e-b788-44ce1adf10eb

   ┌─name───────────────┐
1. │ INFORMATION_SCHEMA │
2. │ default            │
3. │ demo               │
4. │ information_schema │
5. │ system             │
   └────────────────────┘

Can you confirm you were able to get this working locally?

```

To query a table:

```sql
SELECT count(*) FROM `default.taxis`;
```

```sql title="Response"
┌─count()─┐
│ 2171187 │
└─────────┘
```

:::note Backticks required
Backticks are required because ClickHouse doesn't support more than one namespace.
:::

To inspect the table DDL:

```sql
SHOW CREATE TABLE `default.taxis`;
```

```sql title="Response"
┌─statement─────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE demo.`default.taxis` │
│ ( │
│ `VendorID` Nullable(Int64), │
│ `tpep_pickup_datetime` Nullable(DateTime64(6)), │
│ `tpep_dropoff_datetime` Nullable(DateTime64(6)), │
│ `passenger_count` Nullable(Float64), │
│ `trip_distance` Nullable(Float64), │
│ `RatecodeID` Nullable(Float64), │
│ `store_and_fwd_flag` Nullable(String), │
│ `PULocationID` Nullable(Int64), │
│ `DOLocationID` Nullable(Int64), │
│ `payment_type` Nullable(Int64), │
│ `fare_amount` Nullable(Float64), │
│ `extra` Nullable(Float64), │
│ `mta_tax` Nullable(Float64), │
│ `tip_amount` Nullable(Float64), │
│ `tolls_amount` Nullable(Float64), │
│ `improvement_surcharge` Nullable(Float64), │
│ `total_amount` Nullable(Float64), │
│ `congestion_surcharge` Nullable(Float64), │
│ `airport_fee` Nullable(Float64) │
│ ) │
│ ENGINE = Iceberg('http://minio:9000/lakehouse/warehouse/default/taxis/', 'admin', '[HIDDEN]') │
└───────────────────────────────────────────────────────────────────────────────────────────────┘
```

## Loading data from your Data Lake into ClickHouse {#loading-data-from-your-data-lake-into-clickhouse}

If you need to load data from the REST catalog into ClickHouse, start by creating a local ClickHouse table:

```sql
CREATE TABLE taxis
(
`VendorID` Int64,
`tpep_pickup_datetime` DateTime64(6),
`tpep_dropoff_datetime` DateTime64(6),
`passenger_count` Float64,
`trip_distance` Float64,
`RatecodeID` Float64,
`store_and_fwd_flag` String,
`PULocationID` Int64,
`DOLocationID` Int64,
`payment_type` Int64,
`fare_amount` Float64,
`extra` Float64,
`mta_tax` Float64,
`tip_amount` Float64,
`tolls_amount` Float64,
`improvement_surcharge` Float64,
`total_amount` Float64,
`congestion_surcharge` Float64,
`airport_fee` Float64
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(tpep_pickup_datetime)
ORDER BY (VendorID, tpep_pickup_datetime, PULocationID, DOLocationID);
```

Then load the data from your REST catalog table via an `INSERT INTO SELECT`:

```sql
INSERT INTO taxis
SELECT * FROM demo.`default.taxis`;
```
3 changes: 2 additions & 1 deletion sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,8 @@ const sidebars = {
link: { type: "doc", id: "use-cases/data_lake/index" },
items: [
"use-cases/data_lake/glue_catalog",
"use-cases/data_lake/unity_catalog"
"use-cases/data_lake/unity_catalog",
"use-cases/data_lake/rest_catalog"
]
}
]
Expand Down