Skip to content

Conversation

hnrklssn
Copy link
Member

@hnrklssn hnrklssn commented Sep 9, 2025

This adds awareness of the split-file tool to diff_test_updater. Now tests that diff an output file with a file created using split-file will have the corresponding slice in the original test updated, rather than the temporary file created by split-file.

This adds awareness of the `split-file` tool to `diff_test_updater`.
Now tests that diff an output file with a file created using
`split-file` will have the corresponding slice in the original test
updated, rather than the temporary file created by `split-file`.
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-testing-tools

Author: Henrik G. Olsson (hnrklssn)

Changes

This adds awareness of the split-file tool to diff_test_updater. Now tests that diff an output file with a file created using split-file will have the corresponding slice in the original test updated, rather than the temporary file created by split-file.


Full diff: https://github.com/llvm/llvm-project/pull/157765.diff

19 Files Affected:

  • (modified) llvm/utils/lit/lit/DiffUpdater.py (+107-9)
  • (modified) llvm/utils/lit/lit/TestRunner.py (+1-1)
  • (modified) llvm/utils/lit/tests/Inputs/diff-test-update/.gitignore (+7)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file-populated.in (+17)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.in (+13)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.out (+14)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.in (+6)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.out (+6)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-populated.in (+7)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.in (+5)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.out (+6)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/split-both.test (+11)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.in (+6)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.out (+6)
  • (added) llvm/utils/lit/tests/Inputs/diff-test-update/unrelated-split.test (+11)
  • (modified) llvm/utils/lit/tests/Inputs/pass-test-update/should_not_run.py (+1-1)
  • (modified) llvm/utils/lit/tests/diff-test-update.py (+18-1)
  • (modified) llvm/utils/lit/tests/pass-test-update.py (+1-1)
  • (modified) llvm/utils/update_any_test_checks.py (+1-1)
diff --git a/llvm/utils/lit/lit/DiffUpdater.py b/llvm/utils/lit/lit/DiffUpdater.py
index de0001a94f0ba..9a5a264693e1e 100644
--- a/llvm/utils/lit/lit/DiffUpdater.py
+++ b/llvm/utils/lit/lit/DiffUpdater.py
@@ -1,37 +1,135 @@
 import shutil
+import os
 
 """
 This file provides the `diff_test_updater` function, which is invoked on failed RUN lines when lit is executed with --update-tests.
 It checks whether the failed command is `diff` and, if so, uses heuristics to determine which file is the checked-in reference file and which file is output from the test case.
 The heuristics are currently as follows:
+    - if exactly one file originates from the `split-file` command, that file is the reference file and the other is the output file
     - if exactly one file ends with ".expected" (common pattern in LLVM), that file is the reference file and the other is the output file
     - if exactly one file path contains ".tmp" (e.g. because it contains the expansion of "%t"), that file is the reference file and the other is the output file
 If the command matches one of these patterns the output file content is copied to the reference file to make the test pass.
+If the reference file originated in `split-file`, the output file content is instead copied to the corresponding slice of the test file.
 Otherwise the test is ignored.
 
 Possible improvements:
     - Support stdin patterns like "my_binary %s | diff expected.txt"
-    - Scan RUN lines to see if a file is the source of output from a previous command.
+    - Scan RUN lines to see if a file is the source of output from a previous command (other than `split-file`).
       If it is then it is not a reference file that can be copied to, regardless of name, since the test will overwrite it anyways.
     - Only update the parts that need updating (based on the diff output). Could help avoid noisy updates when e.g. whitespace changes are ignored.
 """
 
 
-def get_source_and_target(a, b):
+class NormalFileTarget:
+    def __init__(self, target):
+        self.target = target
+
+    def copyFrom(self, source):
+        shutil.copy(source, self.target)
+
+    def __str__(self):
+        return self.target
+
+
+class SplitFileTarget:
+    def __init__(self, slice_start_idx, test_path, lines):
+        self.slice_start_idx = slice_start_idx
+        self.test_path = test_path
+        self.lines = lines
+
+    def copyFrom(self, source):
+        lines_before = self.lines[: self.slice_start_idx + 1]
+        self.lines = self.lines[self.slice_start_idx + 1 :]
+        slice_end_idx = None
+        for i, l in enumerate(self.lines):
+            if SplitFileTarget._get_split_line_path(l) != None:
+                slice_end_idx = i
+                break
+        if slice_end_idx is not None:
+            lines_after = self.lines[slice_end_idx:]
+        else:
+            lines_after = []
+        with open(source, "r") as f:
+            new_lines = lines_before + f.readlines() + lines_after
+        with open(self.test_path, "w") as f:
+            for l in new_lines:
+                f.write(l)
+
+    def __str__(self):
+        return f"slice in {self.test_path}"
+
+    @staticmethod
+    def get_target_dir(commands, test_path):
+        for cmd in commands:
+            split = cmd.split(" ")
+            if "split-file" not in split:
+                continue
+            start_idx = split.index("split-file")
+            split = split[start_idx:]
+            if len(split) < 3:
+                continue
+            if split[1].strip() != test_path:
+                continue
+            return split[2].strip()
+        return None
+
+    @staticmethod
+    def create(path, commands, test_path, target_dir):
+        filename = path.replace(target_dir, "")
+        if filename.startswith(os.sep):
+            filename = filename[len(os.sep) :]
+        with open(test_path, "r") as f:
+            lines = f.readlines()
+        for i, l in enumerate(lines):
+            p = SplitFileTarget._get_split_line_path(l)
+            if p == filename:
+                idx = i
+                break
+        else:
+            return None
+        return SplitFileTarget(idx, test_path, lines)
+
+    @staticmethod
+    def _get_split_line_path(l):
+        if len(l) < 6:
+            return None
+        if l.startswith("//"):
+            l = l[2:]
+        else:
+            l = l[1:]
+        if l.startswith("--- "):
+            l = l[4:]
+        else:
+            return None
+        return l.rstrip()
+
+
+def get_source_and_target(a, b, test_path, commands):
     """
     Try to figure out which file is the test output and which is the reference.
     """
+    split_target_dir = SplitFileTarget.get_target_dir(commands, test_path)
+    if split_target_dir:
+        a_target = SplitFileTarget.create(a, commands, test_path, split_target_dir)
+        b_target = SplitFileTarget.create(b, commands, test_path, split_target_dir)
+        if a_target and b_target:
+            return None
+        if a_target:
+            return b, a_target
+        if b_target:
+            return a, b_target
+
     expected_suffix = ".expected"
     if a.endswith(expected_suffix) and not b.endswith(expected_suffix):
-        return b, a
+        return b, NormalFileTarget(a)
     if b.endswith(expected_suffix) and not a.endswith(expected_suffix):
-        return a, b
+        return a, NormalFileTarget(b)
 
     tmp_substr = ".tmp"
     if tmp_substr in a and not tmp_substr in b:
-        return a, b
+        return a, NormalFileTarget(b)
     if tmp_substr in b and not tmp_substr in a:
-        return b, a
+        return b, NormalFileTarget(a)
 
     return None
 
@@ -40,16 +138,16 @@ def filter_flags(args):
     return [arg for arg in args if not arg.startswith("-")]
 
 
-def diff_test_updater(result, test):
+def diff_test_updater(result, test, commands):
     args = filter_flags(result.command.args)
     if len(args) != 3:
         return None
     [cmd, a, b] = args
     if cmd != "diff":
         return None
-    res = get_source_and_target(a, b)
+    res = get_source_and_target(a, b, test.getFilePath(), commands)
     if not res:
         return f"update-diff-test: could not deduce source and target from {a} and {b}"
     source, target = res
-    shutil.copy(source, target)
+    target.copyFrom(source)
     return f"update-diff-test: copied {source} to {target}"
diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index 69ca80008e2f9..a4e84d285ae6e 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -1247,7 +1247,7 @@ def executeScriptInternal(
         ):
             for test_updater in litConfig.test_updaters:
                 try:
-                    update_output = test_updater(result, test)
+                    update_output = test_updater(result, test, commands)
                 except Exception as e:
                     output = out
                     output += err
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/.gitignore b/llvm/utils/lit/tests/Inputs/diff-test-update/.gitignore
index dd373bf9e0c66..8ef5350b132c2 100644
--- a/llvm/utils/lit/tests/Inputs/diff-test-update/.gitignore
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/.gitignore
@@ -1,2 +1,9 @@
 ; diff-tmp-dir.test clobbers this file
 empty.txt
+; these test cases are clobbered when run, so they're recreated each time
+single-split-file.test
+single-split-file-populated.test
+multiple-split-file.test
+multiple-split-file-populated.test
+single-split-file-no-expected.test
+split-c-comments.test
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file-populated.in b/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file-populated.in
new file mode 100644
index 0000000000000..e218ed6a0c6ea
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file-populated.in
@@ -0,0 +1,17 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test3.expected %t/out.txt
+
+#--- test1.expected
+unrelated
+#--- test2.expected
+#--- test3.expected
+BAR
+
+BAZ
+
+#--- test4.expected
+filler
+#--- test5.expected
+
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.in b/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.in
new file mode 100644
index 0000000000000..c47db99912c24
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.in
@@ -0,0 +1,13 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test3.expected %t/out.txt
+
+#--- test1.expected
+unrelated
+#--- test2.expected
+#--- test3.expected
+#--- test4.expected
+filler
+#--- test5.expected
+
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.out b/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.out
new file mode 100644
index 0000000000000..c1d2782d3c2d4
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/multiple-split-file.out
@@ -0,0 +1,14 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test3.expected %t/out.txt
+
+#--- test1.expected
+unrelated
+#--- test2.expected
+#--- test3.expected
+FOO
+#--- test4.expected
+filler
+#--- test5.expected
+
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.in b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.in
new file mode 100644
index 0000000000000..510dc7afba16b
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.in
@@ -0,0 +1,6 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test.txt %t/out.txt
+
+#--- test.txt
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.out b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.out
new file mode 100644
index 0000000000000..f52e3004aee15
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-no-expected.out
@@ -0,0 +1,6 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test.txt %t/out.txt
+
+#--- test.txt
+FOO
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-populated.in b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-populated.in
new file mode 100644
index 0000000000000..63042cf9b86bc
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file-populated.in
@@ -0,0 +1,7 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test.expected %t/out.txt
+
+#--- test.expected
+BAR
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.in b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.in
new file mode 100644
index 0000000000000..422ccf2ef6813
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.in
@@ -0,0 +1,5 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test.expected %t/out.txt
+
+#--- test.expected
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.out b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.out
new file mode 100644
index 0000000000000..5552ad328ec5c
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/single-split-file.out
@@ -0,0 +1,6 @@
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/out.txt
+# RUN: diff %t/test.expected %t/out.txt
+
+#--- test.expected
+FOO
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/split-both.test b/llvm/utils/lit/tests/Inputs/diff-test-update/split-both.test
new file mode 100644
index 0000000000000..f564f446cc94b
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/split-both.test
@@ -0,0 +1,11 @@
+# RUN: split-file %s %t
+# RUN: diff %t/split-both.expected %t/split-both.out
+
+# ignore the fact that it's called ".expected"
+# when comparing two files originating in split-file
+
+#--- split-both.expected
+FOO
+#--- split-both.out
+BAR
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.in b/llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.in
new file mode 100644
index 0000000000000..3cda60118f5ba
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.in
@@ -0,0 +1,6 @@
+// RUN: split-file %s %t
+// RUN: cp %S/1.in %t/out.txt
+// RUN: diff %t/test.txt %t/out.txt
+//
+//--- test.txt
+
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.out b/llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.out
new file mode 100644
index 0000000000000..5020804f198b1
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/split-c-comments.out
@@ -0,0 +1,6 @@
+// RUN: split-file %s %t
+// RUN: cp %S/1.in %t/out.txt
+// RUN: diff %t/test.txt %t/out.txt
+//
+//--- test.txt
+FOO
diff --git a/llvm/utils/lit/tests/Inputs/diff-test-update/unrelated-split.test b/llvm/utils/lit/tests/Inputs/diff-test-update/unrelated-split.test
new file mode 100644
index 0000000000000..b04eff36721de
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/diff-test-update/unrelated-split.test
@@ -0,0 +1,11 @@
+# the fact that this test runs split-file is unrelated
+# to the diffed files
+
+# RUN: mkdir %t
+# RUN: split-file %s %t
+# RUN: cp %S/1.in %t/unrelated-split.expected
+# RUN: cp %S/2.in %t/unrelated-split.txt
+# RUN: diff %t/unrelated-split.expected %t/unrelated-split.txt
+
+#--- distraction.txt
+
diff --git a/llvm/utils/lit/tests/Inputs/pass-test-update/should_not_run.py b/llvm/utils/lit/tests/Inputs/pass-test-update/should_not_run.py
index 0fda62c832f08..5b39d208a2ed6 100644
--- a/llvm/utils/lit/tests/Inputs/pass-test-update/should_not_run.py
+++ b/llvm/utils/lit/tests/Inputs/pass-test-update/should_not_run.py
@@ -1,2 +1,2 @@
-def should_not_run(foo, bar):
+def should_not_run(foo, bar, baz):
     raise Exception("this test updater should only run on failure")
diff --git a/llvm/utils/lit/tests/diff-test-update.py b/llvm/utils/lit/tests/diff-test-update.py
index c37d0dccc727c..5f4e98e285625 100644
--- a/llvm/utils/lit/tests/diff-test-update.py
+++ b/llvm/utils/lit/tests/diff-test-update.py
@@ -1,10 +1,27 @@
+# RUN: cp %S/Inputs/diff-test-update/single-split-file.in %S/Inputs/diff-test-update/single-split-file.test
+# RUN: cp %S/Inputs/diff-test-update/single-split-file-populated.in %S/Inputs/diff-test-update/single-split-file-populated.test
+# RUN: cp %S/Inputs/diff-test-update/multiple-split-file.in %S/Inputs/diff-test-update/multiple-split-file.test
+# RUN: cp %S/Inputs/diff-test-update/multiple-split-file-populated.in %S/Inputs/diff-test-update/multiple-split-file-populated.test
+# RUN: cp %S/Inputs/diff-test-update/single-split-file-no-expected.in %S/Inputs/diff-test-update/single-split-file-no-expected.test
+# RUN: cp %S/Inputs/diff-test-update/split-c-comments.in %S/Inputs/diff-test-update/split-c-comments.test
+
 # RUN: not %{lit} --update-tests -v %S/Inputs/diff-test-update | FileCheck %s
 
+# RUN: diff %S/Inputs/diff-test-update/single-split-file.out %S/Inputs/diff-test-update/single-split-file.test
+# RUN: diff %S/Inputs/diff-test-update/single-split-file.out %S/Inputs/diff-test-update/single-split-file-populated.test
+# RUN: diff %S/Inputs/diff-test-update/multiple-split-file.out %S/Inputs/diff-test-update/multiple-split-file.test
+# RUN: diff %S/Inputs/diff-test-update/multiple-split-file.out %S/Inputs/diff-test-update/multiple-split-file-populated.test
+# RUN: diff %S/Inputs/diff-test-update/single-split-file-no-expected.out %S/Inputs/diff-test-update/single-split-file-no-expected.test
+# RUN: diff %S/Inputs/diff-test-update/split-c-comments.out %S/Inputs/diff-test-update/split-c-comments.test
+
+
 # CHECK: # update-diff-test: could not deduce source and target from {{.*}}1.in and {{.*}}2.in
 # CHECK: # update-diff-test: could not deduce source and target from {{.*}}1.txt and {{.*}}2.txt
 # CHECK: # update-diff-test: copied {{.*}}my-file.txt to {{.*}}my-file.expected
 # CHECK: # update-diff-test: copied {{.*}}1.txt to {{.*}}empty.txt
 # CHECK: # update-diff-test: copied {{.*}}diff-tmp.test.tmp.txt to {{.*}}diff-t-out.txt
+# CHECK: # update-diff-test: could not deduce source and target from {{.*}}split-both.expected and {{.*}}split-both.out
+# CHECK: # update-diff-test: copied {{.*}}unrelated-split.txt to {{.*}}unrelated-split.expected
 
 
-# CHECK: Failed: 5 (100.00%)
+# CHECK: Failed: 13 (100.00%)
diff --git a/llvm/utils/lit/tests/pass-test-update.py b/llvm/utils/lit/tests/pass-test-update.py
index 00a4025be660e..2e9f1be2bccab 100644
--- a/llvm/utils/lit/tests/pass-test-update.py
+++ b/llvm/utils/lit/tests/pass-test-update.py
@@ -12,7 +12,7 @@
 # CHECK: Exception occurred in test updater:
 # CHECK: Traceback (most recent call last):
 # CHECK:   File {{.*}}, line {{.*}}, in {{.*}}
-# CHECK:     update_output = test_updater(result, test)
+# CHECK:     update_output = test_updater(result, test, commands)
 # CHECK:   File "{{.*}}{{/|\\}}should_not_run.py", line {{.*}}, in should_not_run
 # CHECK:     raise Exception("this test updater should only run on failure")
 # CHECK: Exception: this test updater should only run on failure
diff --git a/llvm/utils/update_any_test_checks.py b/llvm/utils/update_any_test_checks.py
index 76fe336593929..ec277f140a34f 100755
--- a/llvm/utils/update_any_test_checks.py
+++ b/llvm/utils/update_any_test_checks.py
@@ -63,7 +63,7 @@ def expand_listfile_args(arg_list):
     return exp_arg_list
 
 
-def utc_lit_plugin(result, test):
+def utc_lit_plugin(result, test, commands):
     testname = test.getFilePath()
     if not testname:
         return None

When `diff_test_updater` parses the commands of a failed test case in
search for a `split-file` command, it would incorrectly miss cases such
as: `split-file /some/path "/some/other path"` because it naïvely split
the command string by spaces, resulting in the last path being divided
into two substrings `"/some/other` and `path"`. By using `shlex` the
command line is now lexed according to POSIX shell rules.
if p == filename:
idx = i
break
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for-else is so cool. love it.

@hnrklssn hnrklssn merged commit 9eb17cc into llvm:main Sep 11, 2025
9 checks passed
hnrklssn added a commit to hnrklssn/llvm-project that referenced this pull request Sep 11, 2025
@hnrklssn
Copy link
Member Author

Addressing some Windows test failures in #158160

jtstogel added a commit to jtstogel/llvm-project that referenced this pull request Sep 12, 2025
Sterling-Augustine pushed a commit that referenced this pull request Sep 12, 2025
…file (#158170)

#157765 added tests that depend on the split-file utility, which breaks
the Bazel test target.
hnrklssn added a commit to hnrklssn/llvm-project that referenced this pull request Sep 12, 2025
kateinoigakukun pushed a commit to kateinoigakukun/llvm-project that referenced this pull request Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants