[utils][TableGen] Treat clause aliases equally with names #141763

kparzysz · 2025-05-28T13:31:00Z

The code in DirectiveEmitter that generates clause parsers sorted clause names to ensure that longer names were tried before shorter ones, in cases where a shorter name may be a prefix of a longer one. This matters in the strict Fortran source format, since whitespace is ignored there.

This sorting did not take into account clause aliases, which are just alternative names. These extra names were not protected in the same way, and were just appended immediately after the primary name.

This patch generates a list of pairs Record+Name, where a given record can appear multiple times with different names. Sort that list and use it to generate parsers for each record. What used to be

  ("fred" || "f") >> construct<SomeClause>{} ||
  "foo" << construct<OtherClause>{}

is now

  "fred" >> construct<SomeClause>{} ||
  "foo" >> construct<OtherClause>{} ||
  "f" >> construct<SomeClause>{}

The class "ClauseVal" actually represents a definition of an enumeration value, and in itself it is not bound to any clause. Rename it to EnumVal and add a comment clarifying how it's translated into an actual enum definition in the generated source code. There is no change in functionality.

There were 3 different functions in DirectiveEmitter.cpp doing essentially the same thing: taking a name separated with _ or whitepace, and converting it to the upper-camel case. Extract that into a single function that can handle different sets of separators.

The code in DirectiveEmitter that generates clause parsers sorted clause names to ensure that longer names were tried before shorter ones, in cases where a shorter name may be a prefix of a longer one. This matters in the strict Fortran source format, since whitespace is ignored there. This sorting did not take into account clause aliases, which are just alternative names. These extra names were not protected in the same way, and were just appended immediately after the primary name. This patch generates a list of pairs Record+Name, where a given record can appear multiple times with different names. Sort that list and use it to generate parsers for each record. What used to be ``` ("fred" || "f") >> construct<SomeClause>{} || "foo" << construct<OtherClause>{} ``` is now ``` "fred" >> construct<SomeClause>{} || "foo" >> construct<OtherClause>{} || "f" >> construct<SomeClause>{} ```

llvmbot · 2025-05-28T13:31:35Z

@llvm/pr-subscribers-tablegen

Author: Krzysztof Parzyszek (kparzysz)

Changes

The code in DirectiveEmitter that generates clause parsers sorted clause names to ensure that longer names were tried before shorter ones, in cases where a shorter name may be a prefix of a longer one. This matters in the strict Fortran source format, since whitespace is ignored there.

This sorting did not take into account clause aliases, which are just alternative names. These extra names were not protected in the same way, and were just appended immediately after the primary name.

This patch generates a list of pairs Record+Name, where a given record can appear multiple times with different names. Sort that list and use it to generate parsers for each record. What used to be

  ("fred" || "f") &gt;&gt; construct&lt;SomeClause&gt;{} ||
  "foo" &lt;&lt; construct&lt;OtherClause&gt;{}

is now

  "fred" &gt;&gt; construct&lt;SomeClause&gt;{} ||
  "foo" &gt;&gt; construct&lt;OtherClause&gt;{} ||
  "f" &gt;&gt; construct&lt;SomeClause&gt;{}

Full diff: https://github.com/llvm/llvm-project/pull/141763.diff

2 Files Affected:

(modified) llvm/test/TableGen/directive1.td (+3-1)
(modified) llvm/utils/TableGen/Basic/DirectiveEmitter.cpp (+39-36)

diff --git a/llvm/test/TableGen/directive1.td b/llvm/test/TableGen/directive1.td
index 74091edfa2a66..f756f54c03bfb 100644
--- a/llvm/test/TableGen/directive1.td
+++ b/llvm/test/TableGen/directive1.td
@@ -34,6 +34,7 @@ def TDLC_ClauseB : Clause<"clauseb"> {
 }
 
 def TDLC_ClauseC : Clause<"clausec"> {
+  let aliases = ["ccccccc"];
   let flangClass = "IntExpr";
   let isValueList = 1;
 }
@@ -260,7 +261,8 @@ def TDL_DirA : Directive<"dira"> {
 // IMPL-NEXT:  TYPE_PARSER(
 // IMPL-NEXT:    "clausec" >> construct<TdlClause>(construct<TdlClause::Clausec>(parenthesized(nonemptyList(Parser<IntExpr>{})))) || 
 // IMPL-NEXT:    "clauseb" >> construct<TdlClause>(construct<TdlClause::Clauseb>(maybe(parenthesized(Parser<IntExpr>{})))) ||
-// IMPL-NEXT:    "clausea" >> construct<TdlClause>(construct<TdlClause::Clausea>())
+// IMPL-NEXT:    "clausea" >> construct<TdlClause>(construct<TdlClause::Clausea>()) ||
+// IMPL-NEXT:    "ccccccc" >> construct<TdlClause>(construct<TdlClause::Clausec>(parenthesized(nonemptyList(Parser<IntExpr>{}))))
 // IMPL-NEXT:  )
 // IMPL-EMPTY:
 // IMPL-NEXT:  #endif // GEN_FLANG_CLAUSES_PARSER
diff --git a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
index 9e79a83ed6e18..bd6c543e1741a 100644
--- a/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
+++ b/llvm/utils/TableGen/Basic/DirectiveEmitter.cpp
@@ -608,7 +608,7 @@ static void emitLeafTable(const DirectiveLanguage &DirLang, raw_ostream &OS,
   std::vector<int> Ordering(Directives.size());
   std::iota(Ordering.begin(), Ordering.end(), 0);
 
-  sort(Ordering, [&](int A, int B) {
+  llvm::sort(Ordering, [&](int A, int B) {
     auto &LeavesA = LeafTable[A];
     auto &LeavesB = LeafTable[B];
     int DirA = LeavesA[0], DirB = LeavesB[0];
@@ -1113,59 +1113,63 @@ static void generateFlangClauseParserKindMap(const DirectiveLanguage &DirLang,
      << " Parser clause\");\n";
 }
 
-static bool compareClauseName(const Record *R1, const Record *R2) {
-  Clause C1(R1);
-  Clause C2(R2);
-  return (C1.getName() > C2.getName());
+using RecordWithText = std::pair<const Record *, StringRef>;
+
+static bool compareRecordText(const RecordWithText &A,
+                              const RecordWithText &B) {
+  return A.second > B.second;
+}
+
+static std::vector<RecordWithText>
+getSpellingTexts(ArrayRef<const Record *> Records) {
+  std::vector<RecordWithText> List;
+  for (const Record *R : Records) {
+    Clause C(R);
+    List.push_back(std::make_pair(R, C.getName()));
+    llvm::transform(C.getAliases(), std::back_inserter(List),
+                    [R](StringRef S) { return std::make_pair(R, S); });
+  }
+  return List;
 }
 
 // Generate the parser for the clauses.
 static void generateFlangClausesParser(const DirectiveLanguage &DirLang,
                                        raw_ostream &OS) {
   std::vector<const Record *> Clauses = DirLang.getClauses();
-  // Sort clauses in reverse alphabetical order so with clauses with same
-  // beginning, the longer option is tried before.
-  sort(Clauses, compareClauseName);
+  // Sort clauses in the reverse alphabetical order with respect to their
+  // names and aliases, so that longer names are tried before shorter ones.
+  std::vector<std::pair<const Record *, StringRef>> Names =
+      getSpellingTexts(Clauses);
+  llvm::sort(Names, compareRecordText);
   IfDefScope Scope("GEN_FLANG_CLAUSES_PARSER", OS);
   StringRef Base = DirLang.getFlangClauseBaseClass();
 
+  unsigned LastIndex = Names.size() - 1;
   OS << "\n";
-  unsigned Index = 0;
-  unsigned LastClauseIndex = Clauses.size() - 1;
   OS << "TYPE_PARSER(\n";
-  for (const Clause Clause : Clauses) {
-    const std::vector<StringRef> &Aliases = Clause.getAliases();
-    if (Aliases.empty()) {
-      OS << "  \"" << Clause.getName() << "\"";
-    } else {
-      OS << "  ("
-         << "\"" << Clause.getName() << "\"_tok";
-      for (StringRef Alias : Aliases) {
-        OS << " || \"" << Alias << "\"_tok";
-      }
-      OS << ")";
-    }
+  for (auto [Index, RecTxt] : llvm::enumerate(Names)) {
+    auto [R, N] = RecTxt;
+    Clause C(R);
 
-    StringRef FlangClass = Clause.getFlangClass();
-    OS << " >> construct<" << Base << ">(construct<" << Base
-       << "::" << Clause.getFormattedParserClassName() << ">(";
+    StringRef FlangClass = C.getFlangClass();
+    OS << "  \"" << N << "\" >> construct<" << Base << ">(construct<" << Base
+       << "::" << C.getFormattedParserClassName() << ">(";
     if (FlangClass.empty()) {
       OS << "))";
-      if (Index != LastClauseIndex)
+      if (Index != LastIndex)
         OS << " ||";
       OS << "\n";
-      ++Index;
       continue;
     }
 
-    if (Clause.isValueOptional())
+    if (C.isValueOptional())
       OS << "maybe(";
     OS << "parenthesized(";
-    if (Clause.isValueList())
+    if (C.isValueList())
       OS << "nonemptyList(";
 
-    if (!Clause.getPrefix().empty())
-      OS << "\"" << Clause.getPrefix() << ":\" >> ";
+    if (!C.getPrefix().empty())
+      OS << "\"" << C.getPrefix() << ":\" >> ";
 
     // The common Flang parser are used directly. Their name is identical to
     // the Flang class with first letter as lowercase. If the Flang class is
@@ -1181,19 +1185,18 @@ static void generateFlangClausesParser(const DirectiveLanguage &DirLang,
             .Case("ScalarLogicalExpr", "scalarLogicalExpr")
             .Default(("Parser<" + FlangClass + ">{}").toStringRef(Scratch));
     OS << Parser;
-    if (!Clause.getPrefix().empty() && Clause.isPrefixOptional())
+    if (!C.getPrefix().empty() && C.isPrefixOptional())
       OS << " || " << Parser;
-    if (Clause.isValueList()) // close nonemptyList(.
+    if (C.isValueList()) // close nonemptyList(.
       OS << ")";
     OS << ")"; // close parenthesized(.
 
-    if (Clause.isValueOptional()) // close maybe(.
+    if (C.isValueOptional()) // close maybe(.
       OS << ")";
     OS << "))";
-    if (Index != LastClauseIndex)
+    if (Index != LastIndex)
       OS << " ||";
     OS << "\n";
-    ++Index;
   }
   OS << ")\n";
 }

kparzysz · 2025-05-28T13:36:02Z

PR stack:

clementval

LGTM

tblah

LGTM, thanks!

kparzysz added 3 commits May 28, 2025 08:23

kparzysz requested review from clementval and tblah May 28, 2025 13:31

llvmbot added the tablegen label May 28, 2025

This was referenced May 28, 2025

[utils][TableGen] Clarify usage of ClauseVal, rename to EnumVal #141761

Merged

[utils][TableGen] Unify converting names to upper-camel case #141762

Merged

This was referenced May 28, 2025

[utils][TableGen] Implement clause aliases as alternative spellings #141765

Merged

[utils][TableGen] Handle versions on clause/directive spellings #141766

Closed

[OpenMP] Add directive spellings introduced in spec v6.0 #141772

Open

clementval approved these changes May 28, 2025

View reviewed changes

tblah approved these changes May 29, 2025

View reviewed changes

Base automatically changed from users/kparzysz/spr/t05-upper-camel to main June 5, 2025 12:34

Merge branch 'main' into users/kparzysz/spr/t06-alias-name

e8d6a25

kparzysz merged commit 463a2bd into main Jun 5, 2025
11 checks passed

kparzysz deleted the users/kparzysz/spr/t06-alias-name branch June 5, 2025 14:05

kparzysz mentioned this pull request Jun 5, 2025

[utils][TableGen] Handle versions on clause/directive spellings #143021

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[utils][TableGen] Treat clause aliases equally with names #141763

[utils][TableGen] Treat clause aliases equally with names #141763

Uh oh!

kparzysz commented May 28, 2025

Uh oh!

llvmbot commented May 28, 2025

Uh oh!

kparzysz commented May 28, 2025 •

edited

Loading

Uh oh!

clementval left a comment

Uh oh!

tblah left a comment

Uh oh!

Uh oh!

Uh oh!

[utils][TableGen] Treat clause aliases equally with names #141763

[utils][TableGen] Treat clause aliases equally with names #141763

Uh oh!

Conversation

kparzysz commented May 28, 2025

Uh oh!

llvmbot commented May 28, 2025

Uh oh!

kparzysz commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clementval left a comment

Choose a reason for hiding this comment

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kparzysz commented May 28, 2025 •

edited

Loading