-
Notifications
You must be signed in to change notification settings - Fork 44
feat: Comprehensive ClickHouse dialect enhancements for SQLFluff parity #1860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Comprehensive ClickHouse dialect enhancements for SQLFluff parity #1860
Conversation
0aaf059
to
e023446
Compare
so parametric view doesn't work, i will work on this later |
should be fixed, there are some nuances still, comparison operators in CREATE VIEW are broken, so i changed the comparison operator lexer to parse both tokens as one expression and there seems to be a bug in depth_map.rs common_with (which breaks spacing settings when things are inside CREATE VIEW
|
Hey @leiyangyou, Thanks for this awesome contribution. Sorry, it's taken me a while to look at it, but I am looking forward to trying to get this merged over the weekend. It's quite large, so I may take the approach of taking some of your commits bit by bit and just getting those in. |
hey @benfdking , one thing i realized is that it's best that we parse binary operators as a single token, at the moment it's not, and it's frequently causing issues. (i did make some fixes for clickhouse only) |
Sorry, it's taken me a while to get to this. I can't merge this in one go, so I have been looking at it bit by bit. Where are the fixtures from? Looking at one change at a time #1920 |
I think i broke a couple things, let me rebase and fix things, and probably create a new PR based on that |
the fixtures were generated by claude code, so is most of the code, i'm going to rework on all commits (fixing format style issues, and also add tests where they are missing for most things |
782a3fd
to
02dfb59
Compare
i've force pushed to the branch, and reworked the commits so that most of things are covered by tests, and fixes are squashed, and cargo fmt is ran for each commit |
02dfb59
to
0402fd2
Compare
i've fixed broken tests when running with make rust_test to the best of my ability, but jinja templater-based tests are failing to run on my local environment, couldn't probably get them to work what i did was a local venv with python 3.9, it complains about failing to import sqruff after maturin develop, i can import sqruff in the venv python env but cargo test -p sqruff-lib --all-features --test templaters still fails with PyErr { type: <class 'ModuleNotFoundError'>, value: ModuleNotFoundError("No module named 'sqruff'"), traceback: None }" |
f06bb7b
to
a864b00
Compare
i needed the fix_even_unparsable to work for my work flow, sqruff was messing things up for me when things are not parsable, it however breaks some tests, atm i've changed the config for those tests such that fix_even_unparsable = True. however one thing i realized is that jinja templates are not going to be parsable anyway, how then do we fix jinja template files? (i think sqlruff runs the template, then try to parse it, the fix mode then remembers those placeholders and replaces it back after fixing |
…OPTIMIZE TABLE support Add comprehensive support for advanced ClickHouse features: - Fix EXCEPT clause parsing conflict between SET operations and wildcard EXCEPT - Add REPLACE clause support for wildcard expressions - Add PREWHERE clause support for optimized filtering - Add AggregateFunction data type support - Add OPTIMIZE TABLE statement support - Add extensive test fixtures covering all new features Technical changes: - Use LookaheadExclude to resolve EXCEPT parsing ambiguity - Extend WildcardExpressionSegment with EXCEPT and REPLACE clauses - Add PrewhereClauseSegment to SELECT statements - Add OptimizeTableStatementSegment with full syntax support - Add AggregateFunction to DatatypeSegment definitions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Core Features Added: - Support for double-quoted identifiers in ClickHouse - INDEX definitions with parameterized types (bloom_filter, set, minmax, ngrambf_v1, tokenbf_v1, hypothesis) - PROJECTION clauses with specialized SELECT segments (ProjectionSelectSegment) - Advanced data types: IPv4, IPv6, Nested, and parameterized types like AggregateFunction, SimpleAggregateFunction - ORDER BY with bracketed expressions containing DESC/ASC (e.g., ORDER BY (id, name DESC)) Parser Improvements: - Add IndexTypeIdentifier syntax kind for semantic correctness of index types - Extract ORDER BY item sequence as reusable component to avoid duplication - Support both regular and bracketed ORDER BY expressions with sort directions - Add specialized parsers for all 6 index types with proper syntax highlighting Keywords Added: - BLOOM_FILTER, HYPOTHESIS (for index types) - IPV4, IPV6 (for data types) - PROJECTION (for table projections) Tests Added: - double_quoted_identifiers: Test double-quoted identifier support - advanced_data_types: Test IPv4, IPv6, Nested, and complex parameterized types - create_table_index_projection: Test INDEX and PROJECTION definitions - simple_index: Test basic index syntax - bloom_filter_parameters: Test parameterized bloom_filter indexes - index_types_extended: Test all 6 index types (bloom_filter, set, minmax, ngrambf_v1, tokenbf_v1, hypothesis) - order_by_bracketed: Test ORDER BY with bracketed expressions containing sort directions This commit significantly enhances ClickHouse dialect support for modern table features including advanced indexing, projections, and complex data types used in analytical workloads.
a864b00
to
8e759ad
Compare
Implements support for ClickHouse parametric expressions with {param:Type} syntax, which are used for prepared statements, query parameters, and parametric views. Parser Changes: - Add ParametricExpressionSegment to parse {param:Type} syntax - Extend LiteralGrammar to include parametric expressions - Add SyntaxKind::ParametricExpression for proper AST representation Features Supported: - Simple types: {param:String}, {param:UInt64}, {param:Date} - Complex types: {param:Array(String)}, {param:Map(String, String)} - Nullable types: {param:Nullable(Float64)}, {param:Nullable(Decimal(10,2))} - Specialized types: {param:DateTime64(3)}, {param:IPv4}, {param:LowCardinality(String)} - Enum types: {param:Enum('A', 'B', 'C')} - Parametric view creation with WHERE clauses using parameters - Parametric view calling with named parameters Tests Added: - parametric_expressions: Comprehensive test of all parameter types in queries - parametric_views: Parametric view creation and calling syntax This enables ClickHouse users to use parameterized queries for better performance and security through prepared statements and dynamic query composition.
Implements support for ClickHouse higher-order functions that use two sets of parentheses and lambda expressions. Parser Changes: - Replace FunctionSegment to support optional second parentheses - Support pattern: function_name(parameters)(arguments) - Handle lambda expressions with arrow operator (->) Features Supported: - Higher-order quantile functions: quantileExact(0.5)(column) - Array functions with lambdas: arraySort(x -> -x)(values), arrayMap(x -> x * 2)(numbers) - Conditional aggregate functions: quantileExactArrayIf(0.95)(array, condition) - Backward compatibility with regular functions Tests Added: - higher_order_functions: Comprehensive test of all higher-order function patterns including lambda expressions and double parentheses syntax This enables ClickHouse users to use advanced aggregate and array manipulation functions that are essential for analytical queries.
- Add QualifyClauseSegment to handle QUALIFY expressions - Include QUALIFY in SelectStatementSegment as optional clause - Remove duplicate parametric function test fixtures - Maintain compatibility with existing SELECT statements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…ic view support - Add compound comparison operators (>=, <=, \!=, <>) as single lexer tokens to prevent splitting during formatting - Add corresponding grammar segments for compound operators - Add parametric expression configuration for spacing control - Resolves comparison operator splitting issue where >= became > = 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
The common_with() function in depth_map.rs had a bug where it used take(common_depth) to return the first N elements from the stack, instead of returning the elements that were actually in the intersection. This caused spacing constraints (like space_within = touch for parametric expressions) to fail in CREATE VIEW contexts because the wrong common ancestor hash was being used for constraint lookup. The fix changes the implementation to filter and return only the elements that are actually in the common hash intersection, preserving their order from the original stack. This resolves spacing issues for parametric expressions and other elements within CREATE VIEW statements. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
The LT05 rule now properly ignores SyntaxKind::BlockComment in addition to Comment and InlineComment when the ignore_comment_lines configuration option is enabled. This ensures consistent behavior across all comment types. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
The DEDUPLICATE clause in ClickHouse OPTIMIZE TABLE statements can be used both standalone (OPTIMIZE TABLE t DEDUPLICATE) and with a BY clause (OPTIMIZE TABLE t DEDUPLICATE BY col1, col2). This commit fixes the grammar to make the BY clause optional within DEDUPLICATE, supporting both syntax variants. Also includes code formatting improvements in the ClickHouse dialect. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…alls CV05 should not trigger on = NULL when used as parameter assignments in ClickHouse parametric functions or views. These are valid syntax for passing named parameters, not equality comparisons.
Add proper implementation of the fix_even_unparsable configuration option to prevent sqruff from applying fixes to files with unparsable sections. - Defaults to False for safety (matches config file default) - When False: Skip fixing files with unparsable sections entirely - When True: Allow unsafe fixes that may corrupt unparsable SQL (not recommended) - Check happens before any fix attempts to avoid corruption This prevents the dangerous behavior where sqruff would silently corrupt SQL syntax in unparsable files (e.g. breaking || operators into | |). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Resolves issue where ClickHouse string concatenation operators (||) were being incorrectly tokenized as two separate vertical bar tokens instead of a single binary operator. This caused linting errors about spacing between pipes and formatting issues. Changes: - Add concat_operator lexer matcher for || before vertical_bar matcher - Add ConcatOperatorSegment for the single token - Replace ConcatSegment to use single token instead of pipe sequence Fixes vertical_bar tokenization without affecting other dialects. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds proper parsing for window frame clauses using INTERVAL expressions like `RANGE BETWEEN INTERVAL 28 DAY PRECEDING AND INTERVAL 1 DAY PRECEDING`. This is a common pattern in ClickHouse for time-series analytics. - Add custom FrameExtentGrammar for ClickHouse supporting IntervalExpressionSegment - Replace FrameClauseSegment to use the new frame extent grammar - Add comprehensive test cases for various interval units (DAY, MONTH, YEAR) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Implements the ternary operator (condition ? true_expr : false_expr) for ClickHouse SQL dialect with the following characteristics: - Lower precedence than AND/OR operators - Does NOT support nested ternaries without parentheses (e.g., a ? b : c ? d : e will fail to parse) - Requires explicit parentheses for nesting: a ? b : (c ? d : e) - Matches ClickHouse's actual behavior This implementation avoids recursion issues by not supporting unparenthesized nested ternaries, which aligns with how ClickHouse itself handles these expressions. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
8e759ad
to
6061075
Compare
Two main issues were resolved: 1. Fix JoinLikeClauseGrammar by removing incorrect AliasExpressionSegment that was consuming WHERE clause as alias instead of terminating FROM clause 2. Refine ClickHouse AliasExpressionSegment keyword exclusions: - Add WHERE, ORDER, GROUP, HAVING, LIMIT, UNION, INTERSECT, EXCEPT exclusions to prevent these critical SQL keywords from being parsed as aliases - Remove overly restrictive exclusions (LATERAL, WINDOW, KEYS, WITH, QUALIFY, OFFSET) that prevented common column names from being used as aliases Add comprehensive test cases for ARRAY JOIN with WHERE clause: - Simple case with explicit alias - Complex nested functions case (original reported issue) - Case without explicit alias (edge case) Fixes parsing of queries like: SELECT toDateTime64(start5 + i * step_sec, 6) AS ts, 1 AS join_key FROM bounds ARRAY JOIN range(0, greatest(intDiv(toUInt32((end5 - start5)), step_sec) + 1, 0)) AS i WHERE ts <= now64(6)
352ca8d
to
9c4fd73
Compare
At the moment, we really try to use the fixtures from sqlfluff, if you could also do it incrementally this would be way more helpful, small prs that implement a single dialect feature are much easier to review one by one than 5000 lines of code. At the moment we are definitely trying to play catchup with sqlfluff and we definitely prefer to just copy/translate their code to get us up to speed. |
ok, i will find some time to split them into separate PRs, for clickhouse though, sqlfluff's support is somewhat poor. For most of these features I added, I doubt we will find fixtures in sqlfluff. I was initially working on sqlfluff and tried to get some PR accepted, they still haven't been reviewed in 2 months, I also find sqlfluff too slow, that's how i bumped into sqruff. My workflow is really just give valid clickhouse sql to claude code, and ask it to fix things/generate the fixtures, and run test to see if anything is broken. If i can't find fixtures for sqlfluff, will separating them into smaller PRs help? |
The code changes were minimal really, but we do have lots of fixtures, what i did make sure is that nothing is unparsable, and existing fixtures are not broken. |
Interesting! Ok, thanks for that handy context.
If you get started I should have the new folders ready by the end of the day. |
Summary
This PR implements comprehensive ClickHouse dialect enhancements to achieve significant feature parity with SQLFluff, delivered through a systematic wave-based approach.
🚀 Key Features Added
Wave 1: Lexer Features
"quoted_column"
and"quoted table"
syntax->
functionalityWave 2: CREATE TABLE Enhancements
INDEX idx_name expression TYPE minmax GRANULARITY 1
PROJECTION proj_name (SELECT ...)
Wave 3: JOIN Syntax Verification
Wave 4: Advanced Data Types
SimpleAggregateFunction(max, Float64)
Wave 5: Parametric Views (NEW)
{param_name:DataType}
syntax{param:Enum('val1', 'val2')}
,{param:Nullable(DateTime64)}
CREATE VIEW name AS SELECT ... WHERE col = {param:Type}
SELECT * FROM view(param={param:Type})
INTERVAL {param:UInt32} MINUTE
toStartOfInterval({param:DateTime64}, ...)
Wave 6: Higher-Order Functions (NEW)
quantileExact(0.5)(column)
,quantileExactArrayIf(0.95)(array, condition)
arraySort(x -> -x)(values)
,arrayMap(x -> x * 2)(numbers)
count(*)
,sum(amount)
still workWave 7: QUALIFY Clause (NEW)
SELECT ... FROM ... QUALIFY row_number() OVER (...) = 1
QUALIFY rank <= 10 AND revenue > 1000
QUALIFY rank BETWEEN 1 AND 5 OR score > 95
Wave 8: Testing & Validation
🔍 Examples
Higher-order functions:
QUALIFY clause:
Double-quoted identifiers:
INDEX definitions:
Parametric views:
Advanced data types:
Test Plan
Impact
This brings sqruff's ClickHouse dialect significantly closer to SQLFluff's comprehensive implementation while maintaining high parsing accuracy and performance. Enterprise users now have access to:
The new functionality is particularly valuable for:
🤖 Generated with Claude Code