-
Notifications
You must be signed in to change notification settings - Fork 7
chore(parser): remove old parser #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #81 +/- ##
==========================================
- Coverage 71.39% 71.07% -0.32%
==========================================
Files 200 198 -2
Lines 18191 17660 -531
==========================================
- Hits 12987 12552 -435
+ Misses 4489 4399 -90
+ Partials 715 709 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
📝 WalkthroughWalkthroughRemoves the legacy field-tokenizer/parser and its public APIs, migrates all parsing call sites to SeqQL, and refactors SeqQL parsing to separate initial parse from NOT propagation. Deletes many old parser tests and benchmarks, prunes or updates SeqQL tests (adding fuzzing and range cases), updates integration tests and test helpers, and simplifies gRPC search to always use ParseSeqQL. Several AST string/dump helpers and token-building infrastructure are removed. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Suggested reviewers
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tests/integration_tests/single_test.go (1)
574-579
: Potential loop variable capture in parallel subtestsIf SingleEnvs returns pointers, the loop variable cfg may be captured by reference across parallel subtests. Shadow cfg within the loop to avoid heisenbugs.
for _, cfg := range suites.SingleEnvs() { - t.Run(cfg.Name, func(t *testing.T) { + cfg := cfg // capture per-iteration + t.Run(cfg.Name, func(t *testing.T) { t.Parallel() suite.Run(t, NewSingleTestSuite(cfg)) }) }
🧹 Nitpick comments (6)
parser/ast_test.go (1)
8-27
: Don’t leave flaky/commented tests; use t.Skip with context or remove themReplace the commented-out test with a minimal t.Skip stub (with an issue link) or delete the dead code to keep the suite clean. Also, when re-enabling, avoid nondeterminism (e.g., seed math/rand) to keep tests stable.
Example stub:
+func TestBuildingTree(t *testing.T) { + t.Skip("TODO(moflotas): investigate failure; tracked in #76") + // original assertions go here when fixed +}parser/seqql_filter.go (1)
103-124
: Avoid panic path in parser; return an error insteadPanicking on unexpected index types is risky in user-facing parsing. Prefer a typed error so callers can handle it gracefully.
Proposed change:
switch t { case seq.TokenizerTypeKeyword, seq.TokenizerTypePath: // ... return &ASTNode{Value: &Literal{Field: fieldName, Terms: terms}}, nil case seq.TokenizerTypeText: // ... return buildAndTree(tokens), nil default: - panic(fmt.Errorf("BUG: unexpected index type: %d", t)) + return nil, fmt.Errorf("unsupported index type %d for field %q", t, fieldName) }storeapi/grpc_search.go (2)
192-195
: Inline return of AST root.Minor: avoid the temporary variable to keep it tight.
Apply this diff:
- ast := seqql.Root - - return ast, nil + return seqql.Root, nil
32-42
: Tracing attribute bug: “to” uses req.From instead of req.To.This skews trace metadata and can mislead analysis.
Apply this diff:
- span.AddAttributes(trace.Int64Attribute("to", req.From)) + span.AddAttributes(trace.Int64Attribute("to", req.To))parser/seqql_filter_test.go (2)
408-415
: Use runes in getPerm to avoid breaking multi-byte sequences.Templates are ASCII today, but using []rune makes the helper future-proof and robust with Unicode.
Apply this diff:
-func getPerm(p []int, s string) string { - res := []byte(s) - for i, v := range p { - res[i], res[i+v] = res[i+v], res[i] - } - return string(res) -} +func getPerm(p []int, s string) string { + res := []rune(s) + for i, v := range p { + res[i], res[i+v] = res[i+v], res[i] + } + return string(res) +}
444-460
: Commented-out stress test: consider a build tag or t.Skip.Keeping dead code reduces clarity. If you want it around, guard it behind a dedicated build tag (e.g., //go:build stress) or convert to a test that calls t.Skip with context.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (12)
frac/processor/eval_test.go
(1 hunks)parser/ast_test.go
(1 hunks)parser/bench_test.go
(0 hunks)parser/parser_test.go
(0 hunks)parser/process_test.go
(0 hunks)parser/query_parser.go
(0 hunks)parser/seqql_filter.go
(1 hunks)parser/seqql_filter_test.go
(5 hunks)parser/token_parser.go
(0 hunks)storeapi/grpc_search.go
(2 hunks)tests/integration_tests/integration_test.go
(2 hunks)tests/integration_tests/single_test.go
(4 hunks)
💤 Files with no reviewable changes (5)
- parser/parser_test.go
- parser/bench_test.go
- parser/process_test.go
- parser/query_parser.go
- parser/token_parser.go
🧰 Additional context used
🧬 Code Graph Analysis (4)
parser/seqql_filter.go (2)
seq/tokenizer.go (6)
TokenizerType
(20-20)TokenAll
(10-10)TokenizerTypeKeyword
(24-24)TokenExists
(11-11)TokenIndex
(12-12)TokenizerTypeNoop
(23-23)seq/mapping.go (1)
Mapping
(89-89)
frac/processor/eval_test.go (1)
parser/seqql.go (1)
ParseSeqQL
(28-58)
storeapi/grpc_search.go (4)
storeapi/grpc_v1.go (1)
GrpcV1
(83-98)parser/ast_node.go (1)
ASTNode
(8-11)seq/tokenizer.go (1)
TokenAll
(10-10)parser/seqql.go (1)
ParseSeqQL
(28-58)
tests/integration_tests/single_test.go (1)
tests/suites/single.go (1)
AllFracEnvs
(94-98)
🔇 Additional comments (15)
parser/seqql_filter.go (2)
16-21
: Built-in tokenizer mapping looks goodTreating all, exists, and _index as keyword fields is reasonable and keeps special tokens predictable.
22-36
: Confirm: Defaulting to Keyword when mapping is nil is intentionalindexType returns Keyword when userMapping is nil; parseSeqQLFieldFilter then allows any field (unless explicitly Noop). This changes error behavior vs. “field is not indexed” and effectively enables queries on arbitrary fields when no mapping is provided.
- If this is meant to align with “Index all fields” behavior, consider tying it to config (e.g., IndexAllFields) rather than mapping == nil.
- If not intended, return Noop here and let the caller surface “not indexed”.
Please confirm intended semantics.
Also applies to: 46-49
frac/processor/eval_test.go (1)
76-79
: SeqQL migration in tests: LGTMSwitching to ParseSeqQL and feeding buildEvalTree with query.Root is correct and consistent with the new API.
Also applies to: 86-89
tests/integration_tests/single_test.go (4)
144-164
: Nice table-driven rewrite for basic search casesThe consolidation improves readability and maintenance without changing semantics.
244-269
: NOT-query cases table: LGTMGood coverage across negations and combinations; concise and clear.
471-493
: Wildcard symbol cases table: LGTMThe explicit quoting/escaping scenarios are well covered.
529-543
: Index-all-fields assertions: LGTMResetting mapping and enabling IndexAllFields with restart is clear; expectations align with SeqQL behavior.
tests/integration_tests/integration_test.go (2)
1402-1412
: Range-query table is clear and correctThe bracket/parenthesis semantics and wildcard upper bounds are well represented; counts match the generated dataset.
1710-1721
: Path search: quoting and patterns look correctQuoting path literals and exercising segment/dot/wildcard variants is aligned with SeqQL path semantics.
storeapi/grpc_search.go (3)
102-104
: Call-site update to SeqQL-only parse looks correct.Removing ctx from parseQuery and invoking the SeqQL path directly is consistent with the PR objective and keeps tracing around the call intact.
183-195
: Single SeqQL-based parsing path: good simplification.This consolidates logic and standardizes error handling to InvalidArgument, aligning with the migration.
183-191
: Empty-query fallback (all: ) is safe — no action required.*Quick check: seq.TokenAll == "all" and parser treats it as a built-in field. parseQuery sets seq.TokenAll + ":*", indexType consults builtinMapping (which contains TokenAll) so ParseSeqQL accepts all even when the user mapping is nil or missing that field.
Relevant locations:
- storeapi/grpc_search.go — parseQuery sets fallback: seq.TokenAll + ":*" (around line 185)
- seq/tokenizer.go — const TokenAll = "all" (line ~10)
- parser/seqql_filter.go — builtinMapping contains seq.TokenAll and indexType() falls back to it (lines ~16–32)
- parser/token_literal.go — DumpSeqQL prints "" for all: (lines ~25–27)
- parser/seqql.go — top-level "*" is parsed into Literal with Field seq.TokenAll (around lines ~354–358)
parser/seqql_filter_test.go (3)
194-206
: Nice expansion of range filter coverage.Added cases cover wildcards, quoted/unquoted forms, whitespace, and mixed constructs. This materially improves confidence in range parsing.
471-472
: Benchmark cleanup LGTM.Using
_ = query.Root
avoids unused-variable churn while keeping the benchmark semantics.
484-485
: Long benchmark cleanup LGTM.Same as above; keeps the benchmark focused.
func nextPerm(p []int) { | ||
for i := len(p) - 1; i >= 0; i-- { | ||
if i == 0 || p[i] < len(p)-i-1 { | ||
p[i]++ | ||
return | ||
} | ||
p[i] = 0 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factorial blow-up risk: nextPerm-driven fuzz enumerates n! permutations.
With templates up to length 10–11 (e.g., "AND OR NOT"), the loop runs millions of iterations per template. This will significantly slow or time out CI.
Consider sampling instead of exhaustive enumeration. For example, replace the factorial loop with bounded random shuffles:
// Replace the exhaustive permutation loop with bounded random samples.
const maxSamplesPerTemplate = 5000
for _, template := range templates {
if len(template) >= 12 {
t.Skipf("template %q is too long for fuzz sampling", template)
}
r := rand.New(rand.NewSource(1)) // deterministic for CI
b := []rune(template)
for i := 0; i < maxSamplesPerTemplate; i++ {
r.Shuffle(len(b), func(i, j int) { b[i], b[j] = b[j], b[i] })
s := string(b)
_, err := ParseSeqQL(s, nil)
require.Errorf(t, err, "query: %s", s)
}
}
If you prefer to retain nextPerm/getPerm, add a hard cap on iterations and break after N checks.
func BenchmarkParsing(b *testing.B) { | ||
str := `service: "some service" AND level:1` | ||
for i := 0; i < b.N; i++ { | ||
exp, _ = ParseQuery(str, seq.TestMapping) | ||
} | ||
} | ||
|
||
func BenchmarkParsingLong(b *testing.B) { | ||
str := `((NOT ((((m:19 OR m:20) OR m:18) AND m:16) OR ((NOT (m:25 OR m:26)) AND m:12))) OR (((NOT m:29) AND m:22) OR (((m:31 OR m:32) AND m:14) OR (m:27 AND m:28))))` | ||
for i := 0; i < b.N; i++ { | ||
exp, _ = ParseQuery(str, seq.TestMapping) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those benchmarks are already present in seqql_filter_test.go
} | ||
} | ||
|
||
func TestAggregationFilter(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the point of this test, since it tests the function which only used in tests, so I removed it. But double-check me, please
} | ||
|
||
// error messages are actually readable | ||
func TestParserErr(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
return string(res) | ||
} | ||
|
||
func TestParserFuzz(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
} | ||
} | ||
|
||
func TestParseRange(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this test, since it seems redundant to me. Other tests in pattern
package cover Range
filter functionality
} | ||
} | ||
|
||
func TestTokenization(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
} | ||
} | ||
|
||
func TestTokenizationCaseSensitive(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
} | ||
} | ||
|
||
func TestExistsCaseSensitive(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
assert.Equal(t, "*", r.To.Data) | ||
} | ||
|
||
func TestPropagateNot(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
} | ||
} | ||
|
||
func TestWildcardText(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already present in seqql_filter_test.go
c157901
to
6dfe651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
parser/ast_test.go (1)
11-30
: Use t.Log instead of fmt.Println; confirm bypassing NOT propagation is intentional.Printing to stdout is noisy during tests. Also, using the internal parse() skips NOT propagation by design—verify that’s what you want to assert here.
- fmt.Println(query.SeqQLString()) + t.Log(query.SeqQLString())parser/seqql.go (1)
28-53
: Avoid panicking in parser; return an error instead.Panics in parsing paths can crash callers; better to bubble up an error. This keeps ParseSeqQL safe for production paths.
- if !lex.IsEnd() { - panic(fmt.Errorf("BUG: lexer is not end: %+v", lex)) - } + if !lex.IsEnd() { + return SeqQLQuery{}, fmt.Errorf("unexpected trailing tokens: %+v", lex) + }parser/seqql_filter_test.go (1)
444-457
: Stress test: deterministic RNG recommended; confirm parse() usage.
- Using global rand in a parallel test can be non-deterministic. Consider a local rand.New with a fixed seed (would require threading it into addOperator).
- parse() (not ParseSeqQL) here makes sense if you need pre‑propagation dumps—confirm that intent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
frac/processor/eval_test.go
(3 hunks)parser/ast_node.go
(0 hunks)parser/ast_test.go
(1 hunks)parser/seqql.go
(2 hunks)parser/seqql_filter_test.go
(5 hunks)parser/term_builder.go
(0 hunks)parser/token_ip_range.go
(0 hunks)parser/token_literal.go
(0 hunks)parser/token_range.go
(0 hunks)
💤 Files with no reviewable changes (5)
- parser/ast_node.go
- parser/token_range.go
- parser/token_ip_range.go
- parser/term_builder.go
- parser/token_literal.go
🚧 Files skipped from review as they are similar to previous changes (1)
- frac/processor/eval_test.go
🧰 Additional context used
🧬 Code graph analysis (2)
parser/seqql.go (1)
seq/mapping.go (1)
Mapping
(89-89)
parser/seqql_filter_test.go (1)
parser/ast_node.go (1)
ASTNode
(8-11)
🔇 Additional comments (5)
parser/seqql.go (1)
55-69
: Refactor looks good; verify propagateNot contract.The split between parse() and NOT propagation in ParseSeqQL is clear. Please confirm propagateNot(root) returns (newRoot, needsWrap) as assumed, so wrapping with newNotNode only when needed is correct for all cases.
parser/seqql_filter_test.go (4)
194-207
: Nice additions to range-filter coverage.Good mix of quoted/unquoted, wildcards, spacing, and boolean combos.
398-415
: Enumerating permutations scales as n!; cap or sample to avoid timeouts.This helper powers a factorial loop in TestSeqQLParserFuzz. Please switch to bounded sampling or hard-cap iterations as previously suggested.
471-471
: LGTM: avoid unused var in benchmark.Assigning to blank identifier is the right fix.
485-485
: LGTM: same here.Benchmark fix is correct.
# Conflicts: # storeapi/grpc_search.go
6dfe651
to
e0ee9f0
Compare
assert.Equal(t, 1, len(act.Children[1].Children[1].Children)) | ||
assert.Equal(t, "c:c", act.Children[1].Children[1].Children[0].Value.(*Literal).String()) | ||
assert.Equal(t, 0, len(act.Children[1].Children[1].Children[0].Children)) | ||
act := query.Root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@che jfyi
here, if I would use
ParseSeqQL(`a:a OR b:b AND NOT c:c`, nil)
It propagates not
differently then the old parser. I would check it myself, but you might be interested
Fixes #76
Summary by CodeRabbit
New Features
Changes
Refactor
Tests