Skip to content

[ci] New script to generate test reports as Buildkite Annotations #113447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

DavidSpickett
Copy link
Collaborator

@DavidSpickett DavidSpickett commented Oct 23, 2024

The CI builds now send the results of every lit run to a unique file.
This means we can read them all to make a combined report for all
tests.

This report will be shown as an "annotation" in the build results:
https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation

Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112660
(make sure it is showing "All" instead of "Failures")

This is an alternative to using the existing Buildkite plugin:
https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin

As the plugin is:

  • Specific to Buildkite, and we may move away from Buildkite.
  • Requires docker, unless we were to fork it ourselves.
  • Does not let you customise the report format unless again,
    we make our own fork.

Annotations use GitHub's flavour of Markdown so the main code in the
script generates that text. There is an extra "style" argument generated
to make the formatting nicer in Buildkite.

"context" is the name of the annotation that will be created. By using
different context names for Linux and Windows results we get 2 separate
annotations.

The script also handles calling the buildkite-agent. This makes passing
extra arguments to the agent easier, rather than piping the output of
this script into the agent.

In the future we can remove the agent part of it and simply use
the report content. Either printed to stdout or as a comment on
the GitHub PR.

Copy link

github-actions bot commented Oct 23, 2024

✅ With the latest revision this PR passed the Python code formatter.

@DavidSpickett DavidSpickett force-pushed the ci-report-script branch 7 times, most recently from e0b999b to 5bdfd87 Compare October 24, 2024 08:27
@DavidSpickett DavidSpickett changed the title Ci report script testing! [ci] New script to generate test reports as Buildkite Annotations Oct 24, 2024
@DavidSpickett
Copy link
Collaborator Author

Notes to reviewers:

  • This is based on [ci] Write test results to unique file names #113160.
  • The first commit is that previous PR.
  • The second commit is the new script.
  • The third commit is some example changes so I can show you what the reports look like, they will be removed if this is approved.

This is an alternative to forking the existing Buildkite plugin. I do not have examples of those results because it's can't run without docker, but I do have WIP changes to the scripting to get that far (#113290) if you want co compare.

There are a couple of other things we could do:

  • Print the report to stdout as well so it is part of the logfile for download. However it will have a bunch of markdown stuff in it.
  • Or, write the report to test_results.md and make it an artifact of the job. The annotation can then say "you can download this report from the artifacts...".

I think the current state of this PR is a big improvement as it is, and I am able to commit more time to improving it as time goes on and if this needs to be ported to GitHub infrastructure later.

@DavidSpickett DavidSpickett marked this pull request as ready for review October 24, 2024 09:40
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Oct 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 24, 2024

@llvm/pr-subscribers-testing-tools
@llvm/pr-subscribers-github-workflow
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: David Spickett (DavidSpickett)

Changes

The CI builds now send the results of every lit run to a unique file.
This means we can read them all to make a combined report for all
tests.

This report will be shown as an "annotation" in the build results:
https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation

Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112660#_
(make sure it is showing "All" instead of "Failures")

This is an alternative to using the existing Buildkite plugin:
https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin

As the plugin is:

  • Specific to Buildkite, and we may move away from Buildkite.
  • Requires docker, unless we were to fork it ourselves.
  • Does not let you customise the report format unless again,
    we make our own fork.

Annotations use GitHub's flavour of Markdown so the main code in the
script generates that text. There is an extra "style" argument generated
to make the formatting nicer in Buildkite.

"context" is the name of the annotation that will be created. By using
different context names for Linux and Windows results we get 2 separate
annotations.

The script also handles calling the buildkite-agent. This makes passing
extra arguments to the agent easier, rather than piping the output of
this script into the agent.

In the future we can remove the agent part of it and simply use
the report content. Either printed to stdout or as a comment on
the GitHub PR.


Full diff: https://github.com/llvm/llvm-project/pull/113447.diff

7 Files Affected:

  • (modified) .ci/generate-buildkite-pipeline-premerge (+2-2)
  • (added) .ci/generate_test_report.py (+328)
  • (modified) .ci/monolithic-linux.sh (+15-6)
  • (modified) .ci/monolithic-windows.sh (+7-3)
  • (modified) clang/README.md (+2)
  • (modified) clang/test/Driver/cl-pch.c (+1-1)
  • (modified) llvm/README.txt (+2)
diff --git a/.ci/generate-buildkite-pipeline-premerge b/.ci/generate-buildkite-pipeline-premerge
index 7676ff716c4185..e52133751f09b1 100755
--- a/.ci/generate-buildkite-pipeline-premerge
+++ b/.ci/generate-buildkite-pipeline-premerge
@@ -272,7 +272,7 @@ if [[ "${linux_projects}" != "" ]]; then
   artifact_paths:
   - 'artifacts/**/*'
   - '*_result.json'
-  - 'build/test-results.xml'
+  - 'build/test-results*.xml'
   agents: ${LINUX_AGENTS}
   retry:
     automatic:
@@ -295,7 +295,7 @@ if [[ "${windows_projects}" != "" ]]; then
   artifact_paths:
   - 'artifacts/**/*'
   - '*_result.json'
-  - 'build/test-results.xml'
+  - 'build/test-results*.xml'
   agents: ${WINDOWS_AGENTS}
   retry:
     automatic:
diff --git a/.ci/generate_test_report.py b/.ci/generate_test_report.py
new file mode 100644
index 00000000000000..f2ae116ace99af
--- /dev/null
+++ b/.ci/generate_test_report.py
@@ -0,0 +1,328 @@
+# Script to parse many JUnit XML result files and send a report to the buildkite
+# agent as an annotation.
+#
+# To run the unittests:
+# python3 -m unittest discover -p generate_test_report.py
+
+import argparse
+import unittest
+from io import StringIO
+from junitparser import JUnitXml, Failure
+from textwrap import dedent
+from subprocess import check_call
+
+
+def junit_from_xml(xml):
+    return JUnitXml.fromfile(StringIO(xml))
+
+
+class TestReports(unittest.TestCase):
+    def test_title_only(self):
+        self.assertEqual(_generate_report("Foo", []), ("", None))
+
+    def test_no_tests_in_testsuite(self):
+        self.assertEqual(
+            _generate_report(
+                "Foo",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="0.00">
+          <testsuite name="Empty" tests="0" failures="0" skipped="0" time="0.00">
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            ("", None),
+        )
+
+    def test_no_failures(self):
+        self.assertEqual(
+            _generate_report(
+                "Foo",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="0.00">
+          <testsuite name="Passed" tests="1" failures="0" skipped="0" time="0.00">
+          <testcase classname="Bar/test_1" name="test_1" time="0.00"/>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            (
+                dedent(
+                    """\
+              # Foo
+
+              * 1 test passed"""
+                ),
+                "success",
+            ),
+        )
+
+    def test_report_single_file_single_testsuite(self):
+        self.assertEqual(
+            _generate_report(
+                "Foo",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="Bar" tests="4" failures="2" skipped="1" time="410.63">
+          <testcase classname="Bar/test_1" name="test_1" time="0.02"/>
+          <testcase classname="Bar/test_2" name="test_2" time="0.02">
+            <skipped message="Reason"/>
+          </testcase>
+          <testcase classname="Bar/test_3" name="test_3" time="0.02">
+            <failure><![CDATA[Output goes here]]></failure>
+          </testcase>
+          <testcase classname="Bar/test_4" name="test_4" time="0.02">
+            <failure><![CDATA[Other output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            (
+                dedent(
+                    """\
+          # Foo
+
+          * 1 test passed
+          * 1 test skipped
+          * 2 tests failed
+
+          ## Failed tests
+          (click to see output)
+
+          ### Bar
+          <details>
+          <summary>Bar/test_3/test_3</summary>
+
+          ```
+          Output goes here
+          ```
+          </details>
+          <details>
+          <summary>Bar/test_4/test_4</summary>
+
+          ```
+          Other output goes here
+          ```
+          </details>"""
+                ),
+                "error",
+            ),
+        )
+
+    MULTI_SUITE_OUTPUT = (
+        dedent(
+            """\
+        # ABC and DEF
+
+        * 1 test passed
+        * 1 test skipped
+        * 2 tests failed
+
+        ## Failed tests
+        (click to see output)
+
+        ### ABC
+        <details>
+        <summary>ABC/test_2/test_2</summary>
+
+        ```
+        ABC/test_2 output goes here
+        ```
+        </details>
+
+        ### DEF
+        <details>
+        <summary>DEF/test_2/test_2</summary>
+
+        ```
+        DEF/test_2 output goes here
+        ```
+        </details>"""
+        ),
+        "error",
+    )
+
+    def test_report_single_file_multiple_testsuites(self):
+        self.assertEqual(
+            _generate_report(
+                "ABC and DEF",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="ABC" tests="2" failures="1" skipped="0" time="410.63">
+          <testcase classname="ABC/test_1" name="test_1" time="0.02"/>
+          <testcase classname="ABC/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[ABC/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          <testsuite name="DEF" tests="2" failures="1" skipped="1" time="410.63">
+          <testcase classname="DEF/test_1" name="test_1" time="0.02">
+            <skipped message="reason"/>
+          </testcase>
+          <testcase classname="DEF/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[DEF/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    )
+                ],
+            ),
+            self.MULTI_SUITE_OUTPUT,
+        )
+
+    def test_report_multiple_files_multiple_testsuites(self):
+        self.assertEqual(
+            _generate_report(
+                "ABC and DEF",
+                [
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="ABC" tests="2" failures="1" skipped="0" time="410.63">
+          <testcase classname="ABC/test_1" name="test_1" time="0.02"/>
+          <testcase classname="ABC/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[ABC/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    ),
+                    junit_from_xml(
+                        dedent(
+                            """\
+          <?xml version="1.0" encoding="UTF-8"?>
+          <testsuites time="8.89">
+          <testsuite name="DEF" tests="2" failures="1" skipped="1" time="410.63">
+          <testcase classname="DEF/test_1" name="test_1" time="0.02">
+            <skipped message="reason"/>
+          </testcase>
+          <testcase classname="DEF/test_2" name="test_2" time="0.02">
+            <failure><![CDATA[DEF/test_2 output goes here]]></failure>
+          </testcase>
+          </testsuite>
+          </testsuites>"""
+                        )
+                    ),
+                ],
+            ),
+            self.MULTI_SUITE_OUTPUT,
+        )
+
+
+def _generate_report(title, junit_objects):
+    style = None
+
+    if not junit_objects:
+        return ("", style)
+
+    failures = {}
+    tests_run = 0
+    tests_skipped = 0
+    tests_failed = 0
+
+    for results in junit_objects:
+        for testsuite in results:
+            tests_run += testsuite.tests
+            tests_skipped += testsuite.skipped
+            tests_failed += testsuite.failures
+
+            for test in testsuite:
+                if (
+                    not test.is_passed
+                    and test.result
+                    and isinstance(test.result[0], Failure)
+                ):
+                    if failures.get(testsuite.name) is None:
+                        failures[testsuite.name] = []
+                    failures[testsuite.name].append(
+                        (test.classname + "/" + test.name, test.result[0].text)
+                    )
+
+    if not tests_run:
+        return ("", style)
+
+    style = "error" if tests_failed else "success"
+    report = [f"# {title}", ""]
+
+    tests_passed = tests_run - tests_skipped - tests_failed
+
+    def plural(num_tests):
+        return "test" if num_tests == 1 else "tests"
+
+    if tests_passed:
+        report.append(f"* {tests_passed} {plural(tests_passed)} passed")
+    if tests_skipped:
+        report.append(f"* {tests_skipped} {plural(tests_skipped)} skipped")
+    if tests_failed:
+        report.append(f"* {tests_failed} {plural(tests_failed)} failed")
+
+    if failures:
+        report.extend(["", "## Failed tests", "(click to see output)"])
+        for testsuite_name, failures in failures.items():
+            report.extend(["", f"### {testsuite_name}"])
+            for name, output in failures:
+                report.extend(
+                    [
+                        "<details>",
+                        f"<summary>{name}</summary>",
+                        "",
+                        "```",
+                        output,
+                        "```",
+                        "</details>",
+                    ]
+                )
+
+    return "\n".join(report), style
+
+
+def generate_report(title, junit_files):
+    return _generate_report(title, [JUnitXml.fromfile(p) for p in junit_files])
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "title", help="Title of the test report, without Markdown formatting."
+    )
+    parser.add_argument("context", help="Annotation context to write to.")
+    parser.add_argument("junit_files", help="Paths to JUnit report files.", nargs="*")
+    args = parser.parse_args()
+
+    report, style = generate_report(args.title, args.junit_files)
+    check_call(
+        [
+            "buildkite-agent",
+            "annotate",
+            "--context",
+            args.context,
+            "--style",
+            style,
+            report,
+        ]
+    )
diff --git a/.ci/monolithic-linux.sh b/.ci/monolithic-linux.sh
index b78dc59432b65c..4c4c1c756ba98f 100755
--- a/.ci/monolithic-linux.sh
+++ b/.ci/monolithic-linux.sh
@@ -28,18 +28,24 @@ if [[ -n "${CLEAR_CACHE:-}" ]]; then
   ccache --clear
 fi
 
-function show-stats {
+function at-exit {
+  python3 "${MONOREPO_ROOT}"/.ci/generate_test_report.py ":linux: Linux x64 Test Results" \
+    "linux-x64-test-results" "${BUILD_DIR}"/test-results*.xml
+
   mkdir -p artifacts
   ccache --print-stats > artifacts/ccache_stats.txt
 }
-trap show-stats EXIT
+trap at-exit EXIT
 
 projects="${1}"
 targets="${2}"
 
+lit_args="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --use-unique-output-file-name --timeout=1200 --time-tests"
+
 echo "--- cmake"
 pip install -q -r "${MONOREPO_ROOT}"/mlir/python/requirements.txt
 pip install -q -r "${MONOREPO_ROOT}"/lldb/test/requirements.txt
+pip install -q junitparser==3.2.0
 cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
       -D LLVM_ENABLE_PROJECTS="${projects}" \
       -G Ninja \
@@ -47,7 +53,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
       -D LLVM_ENABLE_ASSERTIONS=ON \
       -D LLVM_BUILD_EXAMPLES=ON \
       -D COMPILER_RT_BUILD_LIBFUZZER=OFF \
-      -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests" \
+      -D LLVM_LIT_ARGS="${lit_args}" \
       -D LLVM_ENABLE_LLD=ON \
       -D CMAKE_CXX_FLAGS=-gmlt \
       -D LLVM_CCACHE_BUILD=ON \
@@ -87,7 +93,8 @@ if [[ "${runtimes}" != "" ]]; then
       -D CMAKE_BUILD_TYPE=RelWithDebInfo \
       -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
       -D LIBCXX_TEST_PARAMS="std=c++03" \
-      -D LIBCXXABI_TEST_PARAMS="std=c++03"
+      -D LIBCXXABI_TEST_PARAMS="std=c++03" \
+      -D LLVM_LIT_ARGS="${lit_args}"
 
   echo "--- ninja runtimes C++03"
 
@@ -104,7 +111,8 @@ if [[ "${runtimes}" != "" ]]; then
       -D CMAKE_BUILD_TYPE=RelWithDebInfo \
       -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
       -D LIBCXX_TEST_PARAMS="std=c++26" \
-      -D LIBCXXABI_TEST_PARAMS="std=c++26"
+      -D LIBCXXABI_TEST_PARAMS="std=c++26" \
+      -D LLVM_LIT_ARGS="${lit_args}"
 
   echo "--- ninja runtimes C++26"
 
@@ -121,7 +129,8 @@ if [[ "${runtimes}" != "" ]]; then
       -D CMAKE_BUILD_TYPE=RelWithDebInfo \
       -D CMAKE_INSTALL_PREFIX="${INSTALL_DIR}" \
       -D LIBCXX_TEST_PARAMS="enable_modules=clang" \
-      -D LIBCXXABI_TEST_PARAMS="enable_modules=clang"
+      -D LIBCXXABI_TEST_PARAMS="enable_modules=clang" \
+      -D LLVM_LIT_ARGS="${lit_args}"
 
   echo "--- ninja runtimes clang modules"
   
diff --git a/.ci/monolithic-windows.sh b/.ci/monolithic-windows.sh
index 91e719c52d4363..303cfadfa917cc 100755
--- a/.ci/monolithic-windows.sh
+++ b/.ci/monolithic-windows.sh
@@ -27,16 +27,20 @@ if [[ -n "${CLEAR_CACHE:-}" ]]; then
 fi
 
 sccache --zero-stats
-function show-stats {
+function at-exit {
+  python "${MONOREPO_ROOT}"/.ci/generate_test_report.py ":windows: Windows x64 Test Results" \
+    "windows-x64-test-results" "${BUILD_DIR}"/test-results*.xml
+
   mkdir -p artifacts
   sccache --show-stats >> artifacts/sccache_stats.txt
 }
-trap show-stats EXIT
+trap at-exit EXIT
 
 projects="${1}"
 targets="${2}"
 
 echo "--- cmake"
+pip install junitparser==3.2.0
 pip install -q -r "${MONOREPO_ROOT}"/mlir/python/requirements.txt
 
 # The CMAKE_*_LINKER_FLAGS to disable the manifest come from research
@@ -53,7 +57,7 @@ cmake -S "${MONOREPO_ROOT}"/llvm -B "${BUILD_DIR}" \
       -D LLVM_ENABLE_ASSERTIONS=ON \
       -D LLVM_BUILD_EXAMPLES=ON \
       -D COMPILER_RT_BUILD_LIBFUZZER=OFF \
-      -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --timeout=1200 --time-tests" \
+      -D LLVM_LIT_ARGS="-v --xunit-xml-output ${BUILD_DIR}/test-results.xml --use-unique-output-file-name --timeout=1200 --time-tests" \
       -D COMPILER_RT_BUILD_ORC=OFF \
       -D CMAKE_C_COMPILER_LAUNCHER=sccache \
       -D CMAKE_CXX_COMPILER_LAUNCHER=sccache \
diff --git a/clang/README.md b/clang/README.md
index b98182d8a3f684..94b1e1a7a07433 100644
--- a/clang/README.md
+++ b/clang/README.md
@@ -23,3 +23,5 @@ If you're interested in more (including how to build Clang) it is best to read t
 * If you find a bug in Clang, please file it in the LLVM bug tracker:
   
     https://github.com/llvm/llvm-project/issues
+
+test change
\ No newline at end of file
diff --git a/clang/test/Driver/cl-pch.c b/clang/test/Driver/cl-pch.c
index 36d83a11242dc6..1a9527458ebf4f 100644
--- a/clang/test/Driver/cl-pch.c
+++ b/clang/test/Driver/cl-pch.c
@@ -25,7 +25,7 @@
 // CHECK-YCTP-SAME: -o
 // CHECK-YCTP-SAME: pchfile.pch
 // CHECK-YCTP-SAME: -x
-// CHECK-YCTP-SAME: c++-header
+// CHECK-YCTP-SAME: c++-header -- this will fail tests on windows!
 
 // Except if a later /TC changes it back.
 // RUN: %clang_cl -Werror /Yc%S/Inputs/pchfile.h /FI%S/Inputs/pchfile.h /c /Fo%t/pchfile.obj /Fp%t/pchfile.pch -v -- %s 2>&1 \
diff --git a/llvm/README.txt b/llvm/README.txt
index b9b71a3b6daff1..ba60b8ffdd072c 100644
--- a/llvm/README.txt
+++ b/llvm/README.txt
@@ -15,3 +15,5 @@ documentation setup.
 
 If you are writing a package for LLVM, see docs/Packaging.rst for our
 suggestions.
+
+test change
\ No newline at end of file

@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Oct 24, 2024

Example reports can be seen here - https://buildkite.com/llvm-project/github-pull-requests/builds/112660

(make sure to switch to "All" instead of "Failures")

@DavidSpickett DavidSpickett added github:workflow and removed clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Oct 24, 2024
@ldionne
Copy link
Member

ldionne commented Oct 24, 2024

Trying to understand what this patch does, basically it adds a summary when looking at the results of a BuildKite job?

Previously:
Screenshot 2024-10-24 at 13 00 24

After this patch:
Screenshot 2024-10-24 at 13 00 47

Basically, the sort of "plain text" report that gets added is the difference. Is that correct? Are there other differences (such as how test results get reported to Github actions)?

@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Oct 24, 2024

Basically, the sort of "plain text" report that gets added is the difference. Is that correct?

Yes, there is also a Linux report there that shows that it succeeded, but you have to select "All" on the "All | Failures" button. Which is maybe redundant but folks like seeing green boxes, I know I do.

This overall report is much more useful than seeing only the last check-whatever that was run at the end of the log. Which might not be the one that failed.

Are there other differences (such as how test results get reported to Github actions)?

The result sent to GitHub is unchanged. For example the result on this PR is a failure because one test failed on Windows, because of the deliberate failure I put in.

If we wanted to send the report as a comment on GitHub, we can certainly look into that as mentioned here https://discourse.llvm.org/t/using-plugins-in-buildkite-ci-that-require-docker/82701/6?u=davidspickett. The Buildkite specific parts of the script are easy to remove.

@ldionne
Copy link
Member

ldionne commented Oct 24, 2024

Ack, thanks for the confirmation.

If we wanted to send the report as a comment on GitHub, we can certainly look into that as mentioned here https://discourse.llvm.org/t/using-plugins-in-buildkite-ci-that-require-docker/82701/6?u=davidspickett. The Buildkite specific parts of the script are easy to remove.

I'd rather not do that. Commenting on the PR should be done very rarely, especially for CI failures, since it clutters the PR and Github already has a builtin system for presenting CI failures.

Overall, I'm fine with this patch, however I wonder how much effort it is worth putting into the BuildKite infrastructure given that we plan on moving as much as possible to Github Actions. With Github actions, the .ci/generate-buildkite-pipeline-premerge script would probably go away entirely in favour of individual workflows or something along those lines, and then the issue of figuring out what failed would probably be a non-issue. I am especially unsure about adding functionality to fundamental utilities like Lit in that context: if we e.g. end up not needing to output test results in individual files a few weeks/months from now, then there may not be any users of --use-unique-output-file-name anymore and then we'll have increased the complexity of Lit for a temporary benefit. Just my .02. I really appreciate efforts like this to make the CI infra easier to use, I just wonder if the specific approach taken by this patch series is in line with the near future direction of the project w.r.t. CI.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments. Script otherwise seems reasonable enough to me.

@boomanaiden154
Copy link
Contributor

I'd rather not do that. Commenting on the PR should be done very rarely, especially for CI failures, since it clutters the PR and Github already has a builtin system for presenting CI failures.

I think this is something that needs to be discussed further when we actually get there. Comments I think can still be useful in certain circumstances, but definitely agree that adding a comment for every push would clutter things, but that's definitely not the only way to do things.

Overall, I'm fine with this patch, however I wonder how much effort it is worth putting into the BuildKite infrastructure given that we plan on moving as much as possible to Github Actions.

The script is already written and works, and there still isn't a concrete timeline for when we are going to move things over to Github actions. This solves a valid complaint I hear somewhat often. For libc++ at least, there are definitely platforms that will not be able to move over to Github precommit where something like this might still be useful from what I understand.

With Github actions, the .ci/generate-buildkite-pipeline-premerge script would probably go away entirely in favour of individual workflows or something along those lines, and then the issue of figuring out what failed would probably be a non-issue. I am especially unsure about adding functionality to fundamental utilities like Lit in that context: if we e.g. end up not needing to output test results in individual files a few weeks/months from now, then there may not be any users of --use-unique-output-file-name anymore and then we'll have increased the complexity of Lit for a temporary benefit.

Do we make any guarantees about supporting lit flags in the future? It's mostly an internal tool, and I would personally be in support of removing flags that are no longer used in tree, unless there are some extremely compelling downstream use cases and there is an appropriate maintenance story.

@ldionne
Copy link
Member

ldionne commented Oct 24, 2024

The script is already written and works, and there still isn't a concrete timeline for when we are going to move things over to Github actions. This solves a valid complaint I hear somewhat often. For libc++ at least, there are definitely platforms that will not be able to move over to Github precommit where something like this might still be useful from what I understand.

True, however for libc++ the benefits are smaller because we run only libc++ specific job in the BuildKite job. I've never had trouble with figuring out what part of a job has caused failures. Either way, like I said I'm fine with the patch, it seems like an improvement.

Do we make any guarantees about supporting lit flags in the future? It's mostly an internal tool, and I would personally be in support of removing flags that are no longer used in tree, unless there are some extremely compelling downstream use cases and there is an appropriate maintenance story.

No, but someone has to remember and actually take the time to do the cleanup. I also 100% support the removal of unused options and I've done things like that in the past, but it doesn't happen unless someone decides to make it their task.

@DavidSpickett
Copy link
Collaborator Author

FWIW, I didn't mean to derail this effort or create significant additional complexity by saying what I said above. My intention was only to point out the tradeoff of adding new functionality for a temporary benefit, which wasn't clear to me had been thought about in those terms.

It didn't derail it, in fact it was good you brought it up because I was too hasty landing that part. I could and should have left it approved and included it as part of larger PRs like this before making any decisions. I don't consider working on any of the alternatives to be wasted time, I needed to think through them anyway.

Not adding the lit option avoids any dependency on the GitHub Actions move, which would be when we could remove the option. We could take the risk there but we have a lot of holidays coming up who knows whether we'll get there before the 20 branch.

We configure in another directory, patch that lit, and configure with -DLLVM_EXTERNAL_LIT= (which I have yet to test)

So I think this is the best of the remaining methods to do this.

Or the one that just occurred to me, which is hiding the lit option: https://docs.python.org/3/library/argparse.html#help

(as usual, I'm not sure why I didn't think of this earlier)

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm-lit testing-tools labels Oct 29, 2024
@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Oct 29, 2024

I have updated this PR to include marking the option as hidden. If folks would be ok with that approach, I will break this up into individual parts for review and landing separately.

If not, I will work more on the lit wrapper approach.

@boomanaiden154 what do you think of those plans?

@DavidSpickett DavidSpickett removed clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' llvm-lit testing-tools labels Oct 29, 2024
@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Nov 4, 2024

I've split up the code:
Hide the lit option - #114812
Write results to unique file names - #113160
This PR, which adds the reporting script.

The CI builds now send the results of every lit run to a unique file.
This means we can read them all to make a combined report for all
tests.

This report will be shown as an "annotation" in the build results:
https://buildkite.com/docs/agent/v3/cli-annotate#creating-an-annotation

Here is an example: https://buildkite.com/llvm-project/github-pull-requests/builds/112546
(make sure it is showing "All" instead of "Failures")

This is an alternative to using the existing Buildkite plugin:
https://github.com/buildkite-plugins/junit-annotate-buildkite-plugin

As the plugin is:
* Specific to Buildkite, and we may move away from Buildkite.
* Requires docker, unless we were to fork it ourselves.
* Does not let you customise the report format unless again,
  we make our own fork.

Annotations use GitHub's flavour of Markdown so the main code in the
script generates that text. There is an extra "style" argument generated
to make the formatting nicer in Buildkite.

"context" is the name of the annotation that will be created. By using
different context names for Linux and Windows results we get 2 separate
annotations.

The script also handles calling the buildkite-agent. This makes passing
extra arguments to the agent easier, rather than piping the output of
this script into the agent.

In the future we can remove the agent part of it and simply use
the report content. Either printed to stdout or as a comment on
the GitHub PR.
@DavidSpickett DavidSpickett merged commit e74a002 into llvm:main Nov 12, 2024
6 checks passed
@DavidSpickett DavidSpickett deleted the ci-report-script branch November 12, 2024 13:34
@DavidSpickett
Copy link
Collaborator Author

DavidSpickett commented Nov 12, 2024

Apologies, I merged this without an approval, I got mixed up with the other 2 patches in the stack. But I think we had a decent consensus anyway? It's an easy revert if you feel that I'm jumping ahead too far.

The only outstanding comment on the reporting script was the testing of it, but I think we're ok with using unittest for now since it is temporary.

Again, I hope a future system makes all this unnecessary so I look forward to deleting this and putting my efforts into that instead. Thanks for all the feedback it was really good to have to think it all through thoroughly.

DavidSpickett added a commit that referenced this pull request Nov 12, 2024
…ions (#113447)"

This reverts commit e74a002.

As it is failing on Linux with "OSError: [Errno 7] Argument list too long: 'buildkite-agent'".
DavidSpickett added a commit to DavidSpickett/llvm-project that referenced this pull request Nov 13, 2024
…ions (llvm#113447)"

This reverts commit 8a1ca6c.

I have fixed 2 things:
* The report is now sent by stdin so we do not hit the limit on the size
  of command line arguments.
* The report is limited to 1MB in size and if we exceed that we fall back
  to listing only the totals with a note telling you to check the full log.
DavidSpickett added a commit that referenced this pull request Nov 13, 2024
…ions (#113447)"

This reverts commit 8a1ca6c.

I have fixed 2 things:
* The report is now sent by stdin so we do not hit the limit on the size
  of command line arguments.
* The report is limited to 1MB in size and if we exceed that we fall back
  to listing only the totals with a note telling you to check the full log.
@mysterymath
Copy link
Contributor

Since this relanded, I've been getting pre-commit breakages on a libc change:
https://buildkite.com/llvm-project/github-pull-requests/builds/120029#01933169-a0ae-443c-9de3-0827c512a223

I can't find anything in the logs that indicates that a build or test failure occurred. Does anything jump out to you as to why this might be happening?

It also looks like a coworker's libc change went through without issue:
https://buildkite.com/llvm-project/github-pull-requests/builds/119274#_

+ runtimes=
--
  | + runtime_targets=
  | + [[ '' != '' ]]
  | + at-exit
  | + mkdir -p artifacts
  | + ccache --print-stats
  | + shopt -s nullglob
  | + python3 /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-5wwkv-1/llvm-project/github-pull-requests/.ci/generate_test_report.py ':linux: Linux x64 Test Results' linux-x64-test-results
  | Traceback (most recent call last):
  | File "/var/lib/buildkite-agent/builds/linux-56-59b8f5d88-5wwkv-1/llvm-project/github-pull-requests/.ci/generate_test_report.py", line 406, in <module>
  | p = subprocess.Popen(
  | File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
  | self._execute_child(args, executable, preexec_fn, close_fds,
  | File "/usr/lib/python3.10/subprocess.py", line 1796, in _execute_child
  | self.pid = _posixsubprocess.fork_exec(
  | TypeError: expected str, bytes or os.PathLike object, not NoneType
  | 🚨 Error: The command exited with status 1
  | user command error: exit status 1

@DavidSpickett
Copy link
Collaborator Author

Fixed by 6a12b43.

@DavidSpickett
Copy link
Collaborator Author

It also looks like a coworker's libc change went through without issue:

This build must have been done before the first land or while this was reverted, I don't see it attempting to use the script in the log.

@mysterymath
Copy link
Contributor

mysterymath commented Nov 18, 2024

Thanks for the quick fix; my PR is green now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants