-
Notifications
You must be signed in to change notification settings - Fork 141
fix(ibis): add support for ClickHouse wrapper types (Nullable, Array, LowCardinality) #1359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(ibis): add support for ClickHouse wrapper types (Nullable, Array, LowCardinality) #1359
Conversation
WalkthroughEnhanced ClickHouse metadata handling by introducing an inner-type extraction helper to recursively unwrap nested type wrappers (Nullable, Array, LowCardinality), extended type mapping to include "datetime64" as TIMESTAMP, and updated column parsing to derive nullability from type structure. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibis-server/app/model/metadata/clickhouse.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/app/model/metadata/clickhouse.py (2)
ibis-server/app/model/metadata/dto.py (2)
RustWrenEngineColumnType(13-58)Column(61-67)ibis-server/app/model/metadata/athena.py (1)
_transform_column_type(140-160)
🔇 Additional comments (2)
ibis-server/app/model/metadata/clickhouse.py (2)
34-34: Good addition: DateTime64 support.Mapping "datetime64" to TIMESTAMP is correct for ClickHouse.
124-127: Using an inner-type extractor is the right approach.This is fine once the extractor is correctly defined and always returns a string. Ensure the helper fix below is applied first.
| is_nullable = 'nullable(' in row["data_type"].lower() | ||
| unique_tables[schema_table].columns.append( | ||
| Column( | ||
| name=row["column_name"], | ||
| type=self._transform_column_type(row["data_type"]), | ||
| notNull=False, | ||
| notNull=not is_nullable, | ||
| description=row["column_comment"], | ||
| properties=None, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nullability detection is incorrect for Array(Nullable(...)); only top-level Nullable should count.
Sub-string check flags arrays-of-nullable elements as nullable columns. Handle wrapper chain: Nullable(...), LowCardinality(Nullable(...)) → nullable; Array(Nullable(...)) → not nullable at column level.
Apply this minimal change here:
- is_nullable = 'nullable(' in row["data_type"].lower()
+ is_nullable = self._is_column_nullable(row["data_type"])Add this helper inside ClickHouseMetadata (see additional snippet below).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In ibis-server/app/model/metadata/clickhouse.py around lines 93-101, the current
substring check flags types like Array(Nullable(...)) as nullable; add a helper
method on ClickHouseMetadata (e.g., is_top_level_nullable(data_type: str) ->
bool) that lowercases and trims the type and returns True only when the
outermost wrapper is Nullable(...) or when the outermost wrapper is
LowCardinality(...) whose immediate inner wrapper is Nullable(...); do not treat
Nullable(...) inside Array(...) as top-level nullable. Replace the current
'nullable = "nullable(" in row["data_type"].lower()' logic to call this helper
when computing notNull.
| def _extract_inner_type(self, data_type: str) -> str: | ||
| """Extract the inner type from ClickHouse type definitions. | ||
| This handles types wrapped in Nullable(...), Array(...), etc. | ||
| Args: | ||
| data_type: The ClickHouse data type string | ||
| Returns: | ||
| The extracted inner type string | ||
| """ | ||
|
|
||
| if '(' in data_type and data_type.endswith(')'): | ||
| paren_start = data_type.find('(') | ||
| type_name = data_type[:paren_start].lower() | ||
| inner = data_type[paren_start + 1:-1] | ||
|
|
||
| if type_name in ['nullable', 'array', 'lowcardinality']: | ||
| return self._extract_inner_type(inner) | ||
| else: | ||
| return type_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: _extract_inner_type is defined at module scope and can return None → runtime crash.
- Defined outside the class but called as self._extract_inner_type(...) → AttributeError.
- No default return when there are no parentheses → inner_type becomes None and
.lower()at Line 126 crashes.
Replace this block with a method inside ClickHouseMetadata and guarantee a string return:
- def _extract_inner_type(self, data_type: str) -> str:
- """Extract the inner type from ClickHouse type definitions.
-
- This handles types wrapped in Nullable(...), Array(...), etc.
-
- Args:
- data_type: The ClickHouse data type string
-
- Returns:
- The extracted inner type string
- """
-
- if '(' in data_type and data_type.endswith(')'):
- paren_start = data_type.find('(')
- type_name = data_type[:paren_start].lower()
- inner = data_type[paren_start + 1:-1]
-
- if type_name in ['nullable', 'array', 'lowcardinality']:
- return self._extract_inner_type(inner)
- else:
- return type_name
+ def _extract_inner_type(self, data_type: str) -> str:
+ """Extract base type from ClickHouse wrappers like Nullable(...), Array(...), LowCardinality(...)."""
+ s = (data_type or "").strip()
+ while True:
+ if "(" in s and s.endswith(")"):
+ paren_start = s.find("(")
+ type_name = s[:paren_start].strip().lower()
+ inner = s[paren_start + 1 : -1].strip()
+ if type_name in ("nullable", "array", "lowcardinality"):
+ s = inner
+ continue
+ # Non-wrapper with params, e.g., Decimal(10,2) or FixedString(16)
+ return type_name
+ # No params/wrappers; return as-is (caller lowercases)
+ return sAdditionally, add this helper inside the same class to correctly compute column nullability:
def _is_column_nullable(self, data_type: str) -> bool:
s = (data_type or "").strip()
while True:
s_l = s.lower()
if s_l.startswith("nullable(") and s.endswith(")"):
return True
if s_l.startswith("lowcardinality(") and s.endswith(")"):
# unwrap and continue; LowCardinality(Nullable(T)) => nullable
s = s[s.find("(") + 1 : -1].strip()
continue
if s_l.startswith("array(") and s.endswith(")"):
# Array(Nullable(T)) does not make the column nullable
return False
return False🤖 Prompt for AI Agents
In ibis-server/app/model/metadata/clickhouse.py around lines 138 to 158, move
_extract_inner_type into the ClickHouseMetadata class (so it can be called as
self._extract_inner_type) and make it always return a string: when the input
contains parentheses unwrap permitted wrappers (nullable, array, lowcardinality)
recursively and return the inner type name lowercased; when there are no
parentheses simply return data_type.strip().lower() (never None). Also add the
_is_column_nullable helper as an instance method in the same class exactly as
described in the comment to correctly determine nullability (handle Nullable,
LowCardinality wrapping and Array semantics), and use these methods where inner
type and nullability are computed.
|
Thanks @rishabh1815769 for working on this. There is a ruff check failure. You can fix it using our formatting command.
|
Summary
Fixes type casting issues for ClickHouse wrapper types such as Nullable(), Array(), and LowCardinality() by extracting the inner type before mapping to Rust Wren Engine column types.
Problem
Previously, ClickHouse columns with wrapper types like Nullable(String), Array(Int32) etc were not being properly recognized and would be marked as UNKNOWN type, causing issues with type inference and query execution.
Summary by CodeRabbit
New Features
Bug Fixes