Skip to content

Commit fbe1ae6

Browse files
2010YOUY01martin-g
andauthored
feat(small): Support <slt:ignore> marker in sqllogictest for non-deterministic expected parts (apache#18857)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Part of apache#17612 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `sqllogictest`s are in general easier to maintain than rust tests, however it's not able to test `EXPLAIN ANALYZE` results, because their results include changing part: (in datafusion-cli) The `elapsed_compute` measurement changes from run to run. ``` > EXPLAIN ANALYZE SELECT * FROM generate_series(100); +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=74.042µs, output_bytes=64.0 KB, output_batches=1] | | | | +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.006 seconds. ``` We can add a special marker to `sqllogictest` to skip those non-deterministic parts. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Changed `sqllogictest` validator to recognize `<slt:ignore>` marker - doc - slt test ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
1 parent 6ed22bb commit fbe1ae6

File tree

4 files changed

+159
-0
lines changed

4 files changed

+159
-0
lines changed

datafusion/sqllogictest/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,17 @@ select substr('Andrew Lamb', 1, 6), '|'
142142
Andrew |
143143
```
144144

145+
## Cookbook: Ignoring volatile output
146+
147+
Sometimes parts of a result change every run (timestamps, counters, etc.). To keep the rest of the snapshot checked in, replace those fragments with the `<slt:ignore>` marker inside the expected block. During validation the marker acts like a wildcard, so only the surrounding text must match.
148+
149+
```text
150+
query TT
151+
EXPLAIN ANALYZE SELECT * FROM generate_series(100);
152+
----
153+
Plan with Metrics LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=<slt:ignore>, output_bytes=<slt:ignore>]
154+
```
155+
145156
# Reference
146157

147158
## Running tests: Validation Mode

datafusion/sqllogictest/src/util.rs

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,43 @@ pub fn df_value_validator(
8282
actual: &[Vec<String>],
8383
expected: &[String],
8484
) -> bool {
85+
// Support ignore marker <slt:ignore> to skip volatile parts of output.
86+
const IGNORE_MARKER: &str = "<slt:ignore>";
87+
let contains_ignore_marker = expected.iter().any(|line| line.contains(IGNORE_MARKER));
88+
8589
let normalized_expected = expected.iter().map(normalizer).collect::<Vec<_>>();
8690
let normalized_actual = actual
8791
.iter()
8892
.map(|strs| strs.iter().join(" "))
8993
.map(|str| str.trim_end().to_string())
9094
.collect_vec();
9195

96+
// If ignore marker present, perform fragment-based matching on the full snapshot.
97+
if contains_ignore_marker {
98+
let expected_snapshot = normalized_expected.join("\n");
99+
let actual_snapshot = normalized_actual.join("\n");
100+
let fragments: Vec<&str> = expected_snapshot.split(IGNORE_MARKER).collect();
101+
let mut pos = 0;
102+
for (i, frag) in fragments.iter().enumerate() {
103+
if frag.is_empty() {
104+
continue;
105+
}
106+
if let Some(idx) = actual_snapshot[pos..].find(frag) {
107+
// Edge case: The following example is expected to fail
108+
// Actual - 'foo bar baz'
109+
// Expected - 'bar <slt:ignore>'
110+
if (i == 0) && (idx != 0) {
111+
return false;
112+
}
113+
114+
pos += idx + frag.len();
115+
} else {
116+
return false;
117+
}
118+
}
119+
return true;
120+
}
121+
92122
if log_enabled!(Warn) && normalized_actual != normalized_expected {
93123
warn!("df validation failed. actual vs expected:");
94124
for i in 0..normalized_actual.len() {
@@ -110,3 +140,20 @@ pub fn df_value_validator(
110140
pub fn is_spark_path(relative_path: &Path) -> bool {
111141
relative_path.starts_with("spark/")
112142
}
143+
144+
#[cfg(test)]
145+
mod tests {
146+
use super::*;
147+
148+
// Validation should fail for the below case:
149+
// Actual - 'foo bar baz'
150+
// Expected - 'bar <slt:ignore>'
151+
#[test]
152+
fn ignore_marker_does_not_skip_leading_text() {
153+
// Actual snapshot contains unexpected prefix before the expected fragment.
154+
let actual = vec![vec!["foo bar baz".to_string()]];
155+
let expected = vec!["bar <slt:ignore>".to_string()];
156+
157+
assert!(!df_value_validator(value_normalizer, &actual, &expected));
158+
}
159+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
statement ok
19+
set datafusion.explain.analyze_level = summary;
20+
21+
query TT
22+
EXPLAIN ANALYZE SELECT * FROM generate_series(100);
23+
----
24+
Plan with Metrics LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=100, batch_size=8192], metrics=[output_rows=101, elapsed_compute=<slt:ignore>, output_bytes=<slt:ignore>]
25+
26+
statement ok
27+
reset datafusion.explain.analyze_level;
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
# =================================
19+
# Test sqllogictest runner features
20+
# =================================
21+
22+
# --------------------------
23+
# Test `<slt:ignore>` marker
24+
# --------------------------
25+
query T
26+
select 'DataFusion'
27+
----
28+
<slt:ignore>
29+
30+
query T
31+
select 'DataFusion'
32+
----
33+
Data<slt:ignore>
34+
35+
query T
36+
select 'DataFusion'
37+
----
38+
<slt:ignore>Fusion
39+
40+
query T
41+
select 'Apache DataFusion';
42+
----
43+
<slt:ignore>Data<slt:ignore>
44+
45+
query T
46+
select 'DataFusion'
47+
----
48+
DataFusion<slt:ignore>
49+
50+
query T
51+
select 'DataFusion'
52+
----
53+
<slt:ignore>DataFusion
54+
55+
query T
56+
select 'DataFusion'
57+
----
58+
<slt:ignore>DataFusion<slt:ignore>
59+
60+
query I
61+
select * from generate_series(3);
62+
----
63+
0
64+
1
65+
<slt:ignore>
66+
3
67+
68+
query I
69+
select * from generate_series(3);
70+
----
71+
<slt:ignore>
72+
1
73+
<slt:ignore>
74+
<slt:ignore>

0 commit comments

Comments
 (0)