Skip to content

Introduce mcdc::TVIdxBuilder (LLVM side, NFC) #80676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,55 @@ struct MCDCRecord {
}
};

namespace mcdc {
/// Compute TestVector Indices "TVIdx" from the Conds graph.
///
/// Clang CodeGen handles the bitmap index based on TVIdx.
/// llvm-cov reconstructs conditions from TVIdx.
///
/// For each leaf "The final decision",
/// - TVIdx should be unique.
/// - TVIdx has the Width.
/// - The width represents the number of possible paths.
/// - The minimum width is 1 "deterministic".
/// - The order of leaves are sorted by Width DESC. It expects
/// latter TVIdx(s) (with Width=1) could be pruned and altered to
/// other simple branch conditions.
///
class TVIdxBuilder {
public:
struct MCDCNode {
int InCount = 0; /// Reference count; temporary use
int Width; /// Number of accumulated paths (>= 1)
ConditionIDs NextIDs;
};

#ifndef NDEBUG
/// This is no longer needed after the assignment.
/// It may be used in assert() for reconfirmation.
SmallVector<MCDCNode> SavedNodes;
#endif

/// Output: Index for TestVectors bitmap (These are not CondIDs)
SmallVector<std::array<int, 2>> Indices;

/// Output: The number of test vectors.
/// Error with HardMaxTVs if the number has exploded.
int NumTestVectors;

/// Hard limit of test vectors
static constexpr auto HardMaxTVs =
std::numeric_limits<decltype(NumTestVectors)>::max();

public:
/// Calculate and assign Indices
/// \param NextIDs The list of {FalseID, TrueID} indexed by ID
/// The first element [0] should be the root node.
/// \param Offset Offset of index to final decisions.
TVIdxBuilder(const SmallVectorImpl<ConditionIDs> &NextIDs, int Offset = 0);
};
} // namespace mcdc

/// A Counter mapping context is used to connect the counters, expressions
/// and the obtained counter values.
class CounterMappingContext {
Expand Down
153 changes: 141 additions & 12 deletions llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,130 @@ Expected<int64_t> CounterMappingContext::evaluate(const Counter &C) const {
return LastPoppedValue;
}

mcdc::TVIdxBuilder::TVIdxBuilder(const SmallVectorImpl<ConditionIDs> &NextIDs,
int Offset)
: Indices(NextIDs.size()) {
// Construct Nodes and set up each InCount
auto N = NextIDs.size();
SmallVector<MCDCNode> Nodes(N);
for (unsigned ID = 0; ID < N; ++ID) {
for (unsigned C = 0; C < 2; ++C) {
#ifndef NDEBUG
Indices[ID][C] = INT_MIN;
#endif
auto NextID = NextIDs[ID][C];
Nodes[ID].NextIDs[C] = NextID;
if (NextID >= 0)
++Nodes[NextID].InCount;
}
}

// Sort key ordered by <-Width, Ord>
Copy link

@ZhuUx ZhuUx Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me. Why end nodes of the decision need to be sorted in descending order of width?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So motivation of the sort is to prepare for truncating all conditions with width=1, is it?
If so, why not just remove them then with an O(n) method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you think they could be removed? It's the prerequisite to assign identical IDs between clang and llvm-cov.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if we did not sort them, we can not reduce size of bitmaps effectively since there might be other conditions with width>1 getting large index.

I'm trying to remove the sort because I find it breaks some nature if the topological relationship of conditions is abnormal (e.g conditions from pattern matching in rust) and generates troubled index maps. I would investigate it more. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Could you introduce me the discussion, or paste the problematic cond tree anywhere?

Copy link

@ZhuUx ZhuUx Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☺️ Sorry I found I made a mistake. I previously encountered a example with decision tree:

flowchart TD
C0 -- T --> C1
C1 -- F --> F1
C1 -- T --> C3
C0 -- F --> C2
C2 -- T --> C3
C2 -- F --> F2
C3 -- F --> C4
C3 -- T --> T3
C4 -- T --> T4
C4 -- F --> F4
Loading

Normal boolean expressions can not generate such decision tree. In normal boolean decisions, either true next of a condition is a next (true or false) of its false next, or false next of a condition is a next of its true next.

I tried to disable the sort and fixed some cases. So I guessed something was broken by the sort. But that's wrong. The real problem is the way to update cond bitmap, which is specified for previous cond bitmap but is inappropriate for current. This implementation still works for pattern matching, thanks to your excellent efforts!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, F1 and F2 have w=1, and others have w=2. Correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The index map is right. The key difference is at most one of C0 and C2 can be true, which not likes (C0 && C1) || C2. In previous I updated false bit of C2 when C0 is true since if C0 is true, C2 must be false. While at present we can not do this.

SmallVector<std::tuple<int, /// -Width
unsigned, /// Ord
int, /// ID
unsigned /// Cond (0 or 1)
>>
Decisions;

// Traverse Nodes to assign Idx
SmallVector<int> Q;
assert(Nodes[0].InCount == 0);
Nodes[0].Width = 1;
Q.push_back(0);

unsigned Ord = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit more commentary on the overall process of the algorithm (and MCDCTVIdxBuilder)? What are the expected inputs and the expected output, generally? I think it would help with the readability a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comments and the unittest. Hope it helps!

while (!Q.empty()) {
auto IID = Q.begin();
int ID = *IID;
Q.erase(IID);
auto &Node = Nodes[ID];
assert(Node.Width > 0);

for (unsigned I = 0; I < 2; ++I) {
auto NextID = Node.NextIDs[I];
assert(NextID != 0 && "NextID should not point to the top");
if (NextID < 0) {
// Decision
Decisions.emplace_back(-Node.Width, Ord++, ID, I);
assert(Ord == Decisions.size());
continue;
}

// Inter Node
auto &NextNode = Nodes[NextID];
assert(NextNode.InCount > 0);

// Assign Idx
assert(Indices[ID][I] == INT_MIN);
Indices[ID][I] = NextNode.Width;
auto NextWidth = int64_t(NextNode.Width) + Node.Width;
if (NextWidth > HardMaxTVs) {
NumTestVectors = HardMaxTVs; // Overflow
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add debug output when there is an overflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess overflow is impossible in llvm-cov.
OTOH, Clang side will catch such a overflow and tell it with HardMaxTVs for warnings/errors. See MCDCCoverageBuilder in CoverageMappingGen.cpp, and mcdc-error-conditions.cpp.

I wonder it would be valuable to add debug message here.

return;
}
NextNode.Width = NextWidth;

// Ready if all incomings are processed.
// Or NextNode.Width hasn't been confirmed yet.
if (--NextNode.InCount == 0)
Q.push_back(NextID);
}
}

llvm::sort(Decisions);

// Assign TestVector Indices in Decision Nodes
int64_t CurIdx = 0;
for (auto [NegWidth, Ord, ID, C] : Decisions) {
int Width = -NegWidth;
assert(Nodes[ID].Width == Width);
assert(Nodes[ID].NextIDs[C] < 0);
assert(Indices[ID][C] == INT_MIN);
Indices[ID][C] = Offset + CurIdx;
CurIdx += Width;
if (CurIdx > HardMaxTVs) {
NumTestVectors = HardMaxTVs; // Overflow
return;
}
}

assert(CurIdx < HardMaxTVs);
NumTestVectors = CurIdx;

#ifndef NDEBUG
for (const auto &Idxs : Indices)
for (auto Idx : Idxs)
assert(Idx != INT_MIN);
SavedNodes = std::move(Nodes);
#endif
}

namespace {

class MCDCRecordProcessor {
/// Construct this->NextIDs with Branches for TVIdxBuilder to use it
/// before MCDCRecordProcessor().
class NextIDsBuilder {
protected:
SmallVector<mcdc::ConditionIDs> NextIDs;

public:
NextIDsBuilder(const ArrayRef<const CounterMappingRegion *> Branches)
: NextIDs(Branches.size()) {
#ifndef NDEBUG
DenseSet<mcdc::ConditionID> SeenIDs;
#endif
for (const auto *Branch : Branches) {
const auto &BranchParams = Branch->getBranchParams();
assert(BranchParams.ID >= 0 && "CondID isn't set");
assert(SeenIDs.insert(BranchParams.ID).second && "Duplicate CondID");
NextIDs[BranchParams.ID] = BranchParams.Conds;
}
assert(SeenIDs.size() == Branches.size());
}
};

class MCDCRecordProcessor : NextIDsBuilder, mcdc::TVIdxBuilder {
/// A bitmap representing the executed test vectors for a boolean expression.
/// Each index of the bitmap corresponds to a possible test vector. An index
/// with a bit value of '1' indicates that the corresponding Test Vector
Expand All @@ -243,9 +364,6 @@ class MCDCRecordProcessor {
/// Total number of conditions in the boolean expression.
unsigned NumConditions;

/// Mapping of a condition ID to its corresponding branch params.
llvm::DenseMap<unsigned, mcdc::ConditionIDs> CondsMap;

/// Vector used to track whether a condition is constant folded.
MCDCRecord::BoolVector Folded;

Expand All @@ -256,34 +374,43 @@ class MCDCRecordProcessor {
/// ExecutedTestVectorBitmap.
MCDCRecord::TestVectors ExecVectors;

#ifndef NDEBUG
DenseSet<unsigned> TVIdxs;
#endif

public:
MCDCRecordProcessor(const BitVector &Bitmap,
const CounterMappingRegion &Region,
ArrayRef<const CounterMappingRegion *> Branches)
: Bitmap(Bitmap), Region(Region),
DecisionParams(Region.getDecisionParams()), Branches(Branches),
NumConditions(DecisionParams.NumConditions),
: NextIDsBuilder(Branches), TVIdxBuilder(this->NextIDs), Bitmap(Bitmap),
Region(Region), DecisionParams(Region.getDecisionParams()),
Branches(Branches), NumConditions(DecisionParams.NumConditions),
Folded(NumConditions, false), IndependencePairs(NumConditions) {}

private:
// Walk the binary decision diagram and try assigning both false and true to
// each node. When a terminal node (ID == 0) is reached, fill in the value in
// the truth table.
void buildTestVector(MCDCRecord::TestVector &TV, mcdc::ConditionID ID,
unsigned Index) {
int TVIdx, unsigned Index) {
assert((Index & (1 << ID)) == 0);

for (auto MCDCCond : {MCDCRecord::MCDC_False, MCDCRecord::MCDC_True}) {
static_assert(MCDCRecord::MCDC_False == 0);
static_assert(MCDCRecord::MCDC_True == 1);
Index |= MCDCCond << ID;
TV[ID] = MCDCCond;
auto NextID = CondsMap[ID][MCDCCond];
auto NextID = NextIDs[ID][MCDCCond];
auto NextTVIdx = TVIdx + Indices[ID][MCDCCond];
assert(NextID == SavedNodes[ID].NextIDs[MCDCCond]);
if (NextID >= 0) {
buildTestVector(TV, NextID, Index);
buildTestVector(TV, NextID, NextTVIdx, Index);
continue;
}

assert(TVIdx < SavedNodes[ID].Width);
assert(TVIdxs.insert(NextTVIdx).second && "Duplicate TVIdx");

if (!Bitmap[DecisionParams.BitmapIdx * CHAR_BIT + Index])
continue;

Expand All @@ -304,9 +431,12 @@ class MCDCRecordProcessor {
void findExecutedTestVectors() {
// Walk the binary decision diagram to enumerate all possible test vectors.
// We start at the root node (ID == 0) with all values being DontCare.
// `TVIdx` starts with 0 and is in the traversal.
// `Index` encodes the bitmask of true values and is initially 0.
MCDCRecord::TestVector TV(NumConditions, MCDCRecord::MCDC_DontCare);
buildTestVector(TV, 0, 0);
buildTestVector(TV, 0, 0, 0);
assert(TVIdxs.size() == unsigned(NumTestVectors) &&
"TVIdxs wasn't fulfilled");
}

// Find an independence pair for each condition:
Expand Down Expand Up @@ -367,7 +497,6 @@ class MCDCRecordProcessor {
// from being measured.
for (const auto *B : Branches) {
const auto &BranchParams = B->getBranchParams();
CondsMap[BranchParams.ID] = BranchParams.Conds;
PosToID[I] = BranchParams.ID;
CondLoc[I] = B->startLoc();
Folded[I++] = (B->Count.isZero() && B->FalseCount.isZero());
Expand Down
56 changes: 56 additions & 0 deletions llvm/unittests/ProfileData/CoverageMappingTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "llvm/Testing/Support/SupportHelpers.h"
#include "gtest/gtest.h"

#include <map>
#include <ostream>
#include <utility>

Expand Down Expand Up @@ -1074,4 +1075,59 @@ TEST(CoverageMappingTest, filename_compilation_dir) {
}
}

TEST(CoverageMappingTest, TVIdxBuilder) {
// ((n0 && n3) || (n2 && n4) || (n1 && n5))
static const std::array<mcdc::ConditionIDs, 6> Branches = {{
{2, 3},
{-1, 5},
{1, 4},
{2, -1},
{1, -1},
{-1, -1},
}};
int Offset = 1000;
auto TheBuilder = mcdc::TVIdxBuilder(
SmallVector<mcdc::ConditionIDs>(ArrayRef(Branches)), Offset);
EXPECT_TRUE(TheBuilder.NumTestVectors < TheBuilder.HardMaxTVs);
EXPECT_EQ(TheBuilder.Indices.size(), 6u);
EXPECT_EQ(TheBuilder.NumTestVectors, 15);

std::map<int, int> Decisions;
for (unsigned I = 0; I < TheBuilder.Indices.size(); ++I) {
struct Rec {
int Width;
std::array<int, 2> Indices;
};
static const std::array<Rec, 6> IndicesRefs = {{
{1, {0, 0}},
{4, {1000, 0}},
{2, {0, 0}},
{1, {1, 1014}},
{2, {2, 1012}},
{4, {1004, 1008}},
}};
EXPECT_EQ(TheBuilder.Indices[I], IndicesRefs[I].Indices);

#ifndef NDEBUG
const auto &Node = TheBuilder.SavedNodes[I];
EXPECT_EQ(Node.Width, IndicesRefs[I].Width);
for (int C = 0; C < 2; ++C) {
auto Index = TheBuilder.Indices[I][C];
if (Node.NextIDs[C] < 0)
EXPECT_TRUE(Decisions.insert({Index, Node.Width}).second);
}
#endif
}

#ifndef NDEBUG
int NextIdx = Offset;
for (const auto [Index, Width] : Decisions) {
EXPECT_EQ(Index, NextIdx);
NextIdx += Width;
}
// The sum of Width(s) is NumTVs.
EXPECT_EQ(NextIdx, Offset + TheBuilder.NumTestVectors);
#endif
}

} // end anonymous namespace