|
| 1 | +From f2c7e85b8f5019419de7fdc159b827fc52f10753 Mon Sep 17 00:00:00 2001 |
| 2 | +From: Wentao Zhang <wentaoz5@illinois.edu> |
| 3 | +Date: Wed, 11 Sep 2024 17:14:47 -0500 |
| 4 | +Subject: [PATCH v3 0/4] Enable measuring the kernel's Source-based Code Coverage and MC/DC with Clang |
| 5 | + |
| 6 | +This series adds support for building x86-64 kernels with Clang's Source- |
| 7 | +based Code Coverage[1] in order to facilitate Modified Condition/Decision |
| 8 | +Coverage (MC/DC)[2] that provably correlates to source code for all levels |
| 9 | +of compiler optimization. |
| 10 | + |
| 11 | +The newly added kernel/llvm-cov/ directory complements the existing gcov |
| 12 | +implementation. Gcov works at the object code level which may better |
| 13 | +reflect actual execution. However, Gcov lacks the necessary information to |
| 14 | +correlate coverage measurement with source code location when compiler |
| 15 | +optimization level is non-zero (which is the default when building the |
| 16 | +kernel). In addition, gcov reports are occasionally ambiguous when |
| 17 | +attempting to compare with source code level developer intent. |
| 18 | + |
| 19 | +In the following gcov example from drivers/firmware/dmi_scan.c, an |
| 20 | +expression with four conditions is reported to have six branch outcomes, |
| 21 | +which is not ideally informative in many safety related use cases, such as |
| 22 | +automotive, medical, and aerospace. |
| 23 | + |
| 24 | + 5: 1068: if (s == e || *e != '/' || !month || month > 12) { |
| 25 | +branch 0 taken 5 (fallthrough) |
| 26 | +branch 1 taken 0 |
| 27 | +branch 2 taken 5 (fallthrough) |
| 28 | +branch 3 taken 0 |
| 29 | +branch 4 taken 0 (fallthrough) |
| 30 | +branch 5 taken 5 |
| 31 | + |
| 32 | +On the other hand, Clang's Source-based Code Coverage instruments at the |
| 33 | +compiler frontend which maintains an accurate mapping from coverage |
| 34 | +measurement to source code location. Coverage reports reflect exactly how |
| 35 | +the code is written regardless of optimization and can present advanced |
| 36 | +metrics like branch coverage and MC/DC in a clearer way. Coverage report |
| 37 | +for the same snippet by llvm-cov would look as follows: |
| 38 | + |
| 39 | + 1068| 5| if (s == e || *e != '/' || !month || month > 12) { |
| 40 | + ------------------ |
| 41 | + | Branch (1068:6): [True: 0, False: 5] |
| 42 | + | Branch (1068:16): [True: 0, False: 5] |
| 43 | + | Branch (1068:29): [True: 0, False: 5] |
| 44 | + | Branch (1068:39): [True: 0, False: 5] |
| 45 | + ------------------ |
| 46 | + |
| 47 | +Clang has added MC/DC support as of its 18.1.0 release. MC/DC is a fine- |
| 48 | +grained coverage metric required by many automotive and aviation industrial |
| 49 | +standards for certifying mission-critical software [3]. |
| 50 | + |
| 51 | +In the following example from arch/x86/events/probe.c, llvm-cov gives the |
| 52 | +MC/DC measurement for the compound logic decision at line 43. |
| 53 | + |
| 54 | + 43| 12| if (msr[bit].test && !msr[bit].test(bit, data)) |
| 55 | + ------------------ |
| 56 | + |---> MC/DC Decision Region (43:8) to (43:50) |
| 57 | + | |
| 58 | + | Number of Conditions: 2 |
| 59 | + | Condition C1 --> (43:8) |
| 60 | + | Condition C2 --> (43:25) |
| 61 | + | |
| 62 | + | Executed MC/DC Test Vectors: |
| 63 | + | |
| 64 | + | C1, C2 Result |
| 65 | + | 1 { T, F = F } |
| 66 | + | 2 { T, T = T } |
| 67 | + | |
| 68 | + | C1-Pair: not covered |
| 69 | + | C2-Pair: covered: (1,2) |
| 70 | + | MC/DC Coverage for Decision: 50.00% |
| 71 | + | |
| 72 | + ------------------ |
| 73 | + 44| 5| continue; |
| 74 | + |
| 75 | +As the results suggest, during the span of measurement, only condition C2 |
| 76 | +(!msr[bit].test(bit, data)) is covered. That means C2 was evaluated to both |
| 77 | +true and false, and in those test vectors C2 affected the decision outcome |
| 78 | +independently. Therefore MC/DC for this decision is 1 out of 2 (50.00%). |
| 79 | + |
| 80 | +To do a full kernel measurement, instrument the kernel with |
| 81 | +LLVM_COV_KERNEL_MCDC enabled, and optionally set a |
| 82 | +LLVM_COV_KERNEL_MCDC_MAX_CONDITIONS value (the default is six). Run the |
| 83 | +testsuites, and collect the raw profile data under |
| 84 | +/sys/kernel/debug/llvm-cov/profraw. Such raw profile data can be merged and |
| 85 | +indexed, and used for generating coverage reports in various formats. |
| 86 | + |
| 87 | + $ cp /sys/kernel/debug/llvm-cov/profraw vmlinux.profraw |
| 88 | + $ llvm-profdata merge vmlinux.profraw -o vmlinux.profdata |
| 89 | + $ llvm-cov show --show-mcdc --show-mcdc-summary \ |
| 90 | + --format=text --use-color=false -output-dir=coverage_reports \ |
| 91 | + -instr-profile vmlinux.profdata vmlinux |
| 92 | + |
| 93 | +The first two patches enable the llvm-cov infrastructure, where the first |
| 94 | +enables source based code coverage and the second adds MC/DC support. The |
| 95 | +next patch disables instrumentation for curve25519-x86_64.c for the same |
| 96 | +reason as gcov. The final patch enables coverage for x86-64. |
| 97 | + |
| 98 | +The choice to use a new Makefile variable LLVM_COV_PROFILE, instead of |
| 99 | +reusing GCOV_PROFILE, was deliberate. More work needs to be done to |
| 100 | +determine if it is appropriate to reuse the same flag. In addition, given |
| 101 | +the fundamentally different approaches to instrumentation and the resulting |
| 102 | +variation in coverage reports, there is a strong likelihood that coverage |
| 103 | +type will need to be managed separately. |
| 104 | + |
| 105 | +This work reuses code from a previous effort by Sami Tolvanen et al. [4]. |
| 106 | +Our aim is for source-based *code coverage* required for high assurance |
| 107 | +(MC/DC) while [4] focused more on performance optimization. |
| 108 | + |
| 109 | +This initial submission is restricted to x86-64. Support for other |
| 110 | +architectures would need a bit more Makefile & linker script modification. |
| 111 | +Informally we've confirmed that arm64 works and more are being tested. |
| 112 | + |
| 113 | +Note that Source-based Code Coverage is Clang-specific and isn't compatible |
| 114 | +with Clang's gcov support in kernel/gcov/. Currently, kernel/gcov/ is not |
| 115 | +able to measure MC/DC without modifying CFLAGS_GCOV and it would face the |
| 116 | +same issues in terms of source correlation as gcov in general does. |
| 117 | + |
| 118 | +Some demo and results can be found in [5]. We will talk about this patch |
| 119 | +series in the Refereed Track at LPC 2024 [6]. |
| 120 | + |
| 121 | +Known Limitations: |
| 122 | + |
| 123 | +Kernel code with logical expressions exceeding |
| 124 | +LVM_COV_KERNEL_MCDC_MAX_CONDITIONS will produce a compiler warning. |
| 125 | +Expressions with up to 47 conditions are found in the Linux kernel source |
| 126 | +tree (as of v6.11), but 46 seems to be the max value before the build fails |
| 127 | +due to kernel size. As of LLVM 19 the max number of conditions possible is |
| 128 | +32767. |
| 129 | + |
| 130 | +As of LLVM 19, certain expressions are still not covered, and will produce |
| 131 | +build warnings when they are encountered: |
| 132 | + |
| 133 | +"[...] if a boolean expression is embedded in the nest of another boolean |
| 134 | + expression but separated by a non-logical operator, this is also not |
| 135 | + supported. For example, in x = (a && b && c && func(d && f)), the d && f |
| 136 | + case starts a new boolean expression that is separated from the other |
| 137 | + conditions by the operator func(). When this is encountered, a warning |
| 138 | + will be generated and the boolean expression will not be |
| 139 | + instrumented." [7] |
| 140 | + |
| 141 | + |
| 142 | +[1] https://clang.llvm.org/docs/SourceBasedCodeCoverage.html |
| 143 | +[2] https://en.wikipedia.org/wiki/Modified_condition%2Fdecision_coverage |
| 144 | +[3] https://digital-library.theiet.org/content/journals/10.1049/sej.1994.0025 |
| 145 | +[4] https://lore.kernel.org/lkml/20210407211704.367039-1-morbo@google.com/ |
| 146 | +[5] https://github.com/xlab-uiuc/linux-mcdc |
| 147 | +[6] https://lpc.events/event/18/contributions/1718/ |
| 148 | +[7] https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#mc-dc-instrumentation |
| 149 | + |
| 150 | +--- |
| 151 | +v2 -> v3: |
| 152 | + |
| 153 | +* Rebased onto v6.11-rc7 from v6.11-rc6. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +Wentao Zhang (4): |
| 158 | + llvm-cov: add Clang's Source-based Code Coverage support |
| 159 | + llvm-cov: add Clang's MC/DC support |
| 160 | + x86: disable llvm-cov instrumentation |
| 161 | + x86: enable llvm-cov support |
| 162 | + |
| 163 | + Makefile | 9 ++ |
| 164 | + arch/Kconfig | 1 + |
| 165 | + arch/x86/Kconfig | 2 + |
| 166 | + arch/x86/crypto/Makefile | 3 +- |
| 167 | + arch/x86/kernel/vmlinux.lds.S | 2 + |
| 168 | + include/asm-generic/vmlinux.lds.h | 36 +++++ |
| 169 | + kernel/Makefile | 1 + |
| 170 | + kernel/llvm-cov/Kconfig | 100 ++++++++++++ |
| 171 | + kernel/llvm-cov/Makefile | 5 + |
| 172 | + kernel/llvm-cov/fs.c | 253 ++++++++++++++++++++++++++++++ |
| 173 | + kernel/llvm-cov/llvm-cov.h | 156 ++++++++++++++++++ |
| 174 | + scripts/Makefile.lib | 23 +++ |
| 175 | + scripts/mod/modpost.c | 2 + |
| 176 | + 13 files changed, 592 insertions(+), 1 deletion(-) |
| 177 | + create mode 100644 kernel/llvm-cov/Kconfig |
| 178 | + create mode 100644 kernel/llvm-cov/Makefile |
| 179 | + create mode 100644 kernel/llvm-cov/fs.c |
| 180 | + create mode 100644 kernel/llvm-cov/llvm-cov.h |
| 181 | + |
| 182 | +-- |
| 183 | +2.45.2 |
| 184 | + |
0 commit comments