Skip to content

Commit af6acd7

Browse files
authored
[Clang][Comments] Support for parsing headers in Doxygen \par commands (#91100)
### Background Doxygen's `\par` command ([link](https://www.doxygen.nl/manual/commands.html#cmdpar)) has an optional argument, which denotes the header of the paragraph started by a given `\par` command. In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code: ``` $ cat test.cpp /// \par User defined paragraph: /// Contents of the paragraph. /// /// \par /// New paragraph under the same heading. /// /// \par /// A second paragraph. class A {}; $ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp `-CXXRecordDecl 0x1530f3a78 <test.cpp:11:1, col:10> col:7 class A definition |-FullComment 0x1530fea38 <line:2:4, line:9:23> | |-ParagraphComment 0x1530fe7e0 <line:2:4> | | `-TextComment 0x1530fe7b8 <col:4> Text=" " | |-BlockCommandComment 0x1530fe800 <col:5, line:3:30> Name="par" | | `-ParagraphComment 0x1530fe878 <line:2:9, line:3:30> | | |-TextComment 0x1530fe828 <line:2:9, col:32> Text=" User defined paragraph:" | | `-TextComment 0x1530fe848 <line:3:4, col:30> Text=" Contents of the paragraph." | |-ParagraphComment 0x1530fe8c0 <line:5:4> | | `-TextComment 0x1530fe898 <col:4> Text=" " | |-BlockCommandComment 0x1530fe8e0 <col:5, line:6:41> Name="par" | | `-ParagraphComment 0x1530fe930 <col:4, col:41> | | `-TextComment 0x1530fe908 <col:4, col:41> Text=" New paragraph under the same heading." | |-ParagraphComment 0x1530fe978 <line:8:4> | | `-TextComment 0x1530fe950 <col:4> Text=" " | `-BlockCommandComment 0x1530fe998 <col:5, line:9:23> Name="par" | `-ParagraphComment 0x1530fe9e8 <col:4, col:23> | `-TextComment 0x1530fe9c0 <col:4, col:23> Text=" A second paragraph." `-CXXRecordDecl 0x1530f3bb0 <line:11:1, col:7> col:7 implicit class A ``` As we can see above, the optional paragraph heading (`"User defined paragraph"`) is not an argument of the `\par` `BlockCommandComment`, but instead a child `TextComment`. For documentation generators like [hdoc](https://hdoc.io/), it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible. ### Change This change parses `\par` command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition: - AST unit tests are defined to test this functionality when an argument is present, isn't present, with additional spacing, etc. - TableGen is updated with an `IsParCommand` to support this functionality - `lit` tests are updated where needed
1 parent 8e0cd73 commit af6acd7

File tree

8 files changed

+239
-11
lines changed

8 files changed

+239
-11
lines changed

clang/docs/ReleaseNotes.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,15 @@ here. Generic improvements to Clang as a whole or to its underlying
150150
infrastructure are described first, followed by language-specific
151151
sections with improvements to Clang's support for those languages.
152152

153+
- The ``\par`` documentation comment command now supports an optional
154+
argument, which denotes the header of the paragraph started by
155+
an instance of the ``\par`` command comment. The implementation
156+
of the argument handling matches its semantics
157+
`in Doxygen <https://www.doxygen.nl/manual/commands.html#cmdpar>`.
158+
Namely, any text on the same line as the ``\par`` command will become
159+
a header for the paragaph, and if there is no text then the command
160+
will start a new paragraph.
161+
153162
C++ Language Changes
154163
--------------------
155164
- C++17 support is now completed, with the enablement of the

clang/include/clang/AST/CommentCommandTraits.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,10 @@ struct CommandInfo {
8888
LLVM_PREFERRED_TYPE(bool)
8989
unsigned IsHeaderfileCommand : 1;
9090

91+
/// True if this is a \\par command.
92+
LLVM_PREFERRED_TYPE(bool)
93+
unsigned IsParCommand : 1;
94+
9195
/// True if we don't want to warn about this command being passed an empty
9296
/// paragraph. Meaningful only for block commands.
9397
LLVM_PREFERRED_TYPE(bool)

clang/include/clang/AST/CommentCommands.td

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ class Command<string name> {
1818
bit IsThrowsCommand = 0;
1919
bit IsDeprecatedCommand = 0;
2020
bit IsHeaderfileCommand = 0;
21+
bit IsParCommand = 0;
2122

2223
bit IsEmptyParagraphAllowed = 0;
2324

@@ -156,7 +157,7 @@ def Date : BlockCommand<"date">;
156157
def Invariant : BlockCommand<"invariant">;
157158
def Li : BlockCommand<"li">;
158159
def Note : BlockCommand<"note">;
159-
def Par : BlockCommand<"par">;
160+
def Par : BlockCommand<"par"> { let IsParCommand = 1; let NumArgs = 1; }
160161
def Post : BlockCommand<"post">;
161162
def Pre : BlockCommand<"pre">;
162163
def Remark : BlockCommand<"remark">;

clang/include/clang/AST/CommentParser.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,9 @@ class Parser {
105105
ArrayRef<Comment::Argument>
106106
parseThrowCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
107107

108+
ArrayRef<Comment::Argument>
109+
parseParCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
110+
108111
BlockCommandComment *parseBlockCommand();
109112
InlineCommandComment *parseInlineCommand();
110113

@@ -123,4 +126,3 @@ class Parser {
123126
} // end namespace clang
124127

125128
#endif
126-

clang/lib/AST/CommentParser.cpp

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,63 @@ class TextTokenRetokenizer {
222222
return true;
223223
}
224224

225+
// Check if this line starts with @par or \par
226+
bool startsWithParCommand() {
227+
unsigned Offset = 1;
228+
229+
// Skip all whitespace characters at the beginning.
230+
// This needs to backtrack because Pos has already advanced past the
231+
// actual \par or @par command by the time this function is called.
232+
while (isWhitespace(*(Pos.BufferPtr - Offset)))
233+
Offset++;
234+
235+
// Once we've reached the whitespace, backtrack and check if the previous
236+
// four characters are \par or @par.
237+
llvm::StringRef LineStart(Pos.BufferPtr - Offset - 3, 4);
238+
return LineStart.starts_with("\\par") || LineStart.starts_with("@par");
239+
}
240+
241+
/// Extract a par command argument-header.
242+
bool lexParHeading(Token &Tok) {
243+
if (isEnd())
244+
return false;
245+
246+
Position SavedPos = Pos;
247+
248+
consumeWhitespace();
249+
SmallString<32> WordText;
250+
const char *WordBegin = Pos.BufferPtr;
251+
SourceLocation Loc = getSourceLocation();
252+
253+
if (!startsWithParCommand())
254+
return false;
255+
256+
// Read until the end of this token, which is effectively the end of the
257+
// line. This gets us the content of the par header, if there is one.
258+
while (!isEnd()) {
259+
WordText.push_back(peek());
260+
if (Pos.BufferPtr + 1 == Pos.BufferEnd) {
261+
consumeChar();
262+
break;
263+
}
264+
consumeChar();
265+
}
266+
267+
unsigned Length = WordText.size();
268+
if (Length == 0) {
269+
Pos = SavedPos;
270+
return false;
271+
}
272+
273+
char *TextPtr = Allocator.Allocate<char>(Length + 1);
274+
275+
memcpy(TextPtr, WordText.c_str(), Length + 1);
276+
StringRef Text = StringRef(TextPtr, Length);
277+
278+
formTokenWithChars(Tok, Loc, WordBegin, Length, Text);
279+
return true;
280+
}
281+
225282
/// Extract a word -- sequence of non-whitespace characters.
226283
bool lexWord(Token &Tok) {
227284
if (isEnd())
@@ -394,6 +451,24 @@ Parser::parseThrowCommandArgs(TextTokenRetokenizer &Retokenizer,
394451
return llvm::ArrayRef(Args, ParsedArgs);
395452
}
396453

454+
ArrayRef<Comment::Argument>
455+
Parser::parseParCommandArgs(TextTokenRetokenizer &Retokenizer,
456+
unsigned NumArgs) {
457+
assert(NumArgs > 0);
458+
auto *Args = new (Allocator.Allocate<Comment::Argument>(NumArgs))
459+
Comment::Argument[NumArgs];
460+
unsigned ParsedArgs = 0;
461+
Token Arg;
462+
463+
while (ParsedArgs < NumArgs && Retokenizer.lexParHeading(Arg)) {
464+
Args[ParsedArgs] = Comment::Argument{
465+
SourceRange(Arg.getLocation(), Arg.getEndLocation()), Arg.getText()};
466+
ParsedArgs++;
467+
}
468+
469+
return llvm::ArrayRef(Args, ParsedArgs);
470+
}
471+
397472
BlockCommandComment *Parser::parseBlockCommand() {
398473
assert(Tok.is(tok::backslash_command) || Tok.is(tok::at_command));
399474

@@ -449,6 +524,9 @@ BlockCommandComment *Parser::parseBlockCommand() {
449524
else if (Info->IsThrowsCommand)
450525
S.actOnBlockCommandArgs(
451526
BC, parseThrowCommandArgs(Retokenizer, Info->NumArgs));
527+
else if (Info->IsParCommand)
528+
S.actOnBlockCommandArgs(BC,
529+
parseParCommandArgs(Retokenizer, Info->NumArgs));
452530
else
453531
S.actOnBlockCommandArgs(BC, parseCommandArgs(Retokenizer, Info->NumArgs));
454532

clang/test/Index/comment-misc-tags.m

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -91,18 +91,16 @@ @interface IOCommandGate
9191

9292
struct Test {int filler;};
9393

94-
// CHECK: (CXComment_BlockCommand CommandName=[par]
94+
// CHECK: (CXComment_BlockCommand CommandName=[par] Arg[0]=User defined paragraph:
9595
// CHECK-NEXT: (CXComment_Paragraph
96-
// CHECK-NEXT: (CXComment_Text Text=[ User defined paragraph:] HasTrailingNewline)
9796
// CHECK-NEXT: (CXComment_Text Text=[ Contents of the paragraph.])))
9897
// CHECK: (CXComment_BlockCommand CommandName=[par]
9998
// CHECK-NEXT: (CXComment_Paragraph
100-
// CHECK-NEXT: (CXComment_Text Text=[ New paragraph under the same heading.])))
99+
// CHECK-NEXT: (CXComment_Text Text=[New paragraph under the same heading.])))
101100
// CHECK: (CXComment_BlockCommand CommandName=[note]
102101
// CHECK-NEXT: (CXComment_Paragraph
103102
// CHECK-NEXT: (CXComment_Text Text=[ This note consists of two paragraphs.] HasTrailingNewline)
104103
// CHECK-NEXT: (CXComment_Text Text=[ This is the first paragraph.])))
105104
// CHECK: (CXComment_BlockCommand CommandName=[par]
106105
// CHECK-NEXT: (CXComment_Paragraph
107-
// CHECK-NEXT: (CXComment_Text Text=[ And this is the second paragraph.])))
108-
106+
// CHECK-NEXT: (CXComment_Text Text=[And this is the second paragraph.])))

clang/unittests/AST/CommentParser.cpp

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1639,6 +1639,143 @@ TEST_F(CommentParserTest, ThrowsCommandHasArg9) {
16391639
}
16401640
}
16411641

1642+
TEST_F(CommentParserTest, ParCommandHasArg1) {
1643+
const char *Sources[] = {
1644+
"/// @par Paragraph header:", "/// @par Paragraph header:\n",
1645+
"/// @par Paragraph header:\r\n", "/// @par Paragraph header:\n\r",
1646+
"/** @par Paragraph header:*/",
1647+
};
1648+
1649+
for (size_t i = 0, e = std::size(Sources); i != e; i++) {
1650+
FullComment *FC = parseString(Sources[i]);
1651+
ASSERT_TRUE(HasChildCount(FC, 2));
1652+
1653+
ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
1654+
{
1655+
BlockCommandComment *BCC;
1656+
ParagraphComment *PC;
1657+
ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
1658+
ASSERT_TRUE(HasChildCount(PC, 0));
1659+
ASSERT_TRUE(BCC->getNumArgs() == 1);
1660+
ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
1661+
}
1662+
}
1663+
}
1664+
1665+
TEST_F(CommentParserTest, ParCommandHasArg2) {
1666+
const char *Sources[] = {
1667+
"/// @par Paragraph header: ", "/// @par Paragraph header: \n",
1668+
"/// @par Paragraph header: \r\n", "/// @par Paragraph header: \n\r",
1669+
"/** @par Paragraph header: */",
1670+
};
1671+
1672+
for (size_t i = 0, e = std::size(Sources); i != e; i++) {
1673+
FullComment *FC = parseString(Sources[i]);
1674+
ASSERT_TRUE(HasChildCount(FC, 2));
1675+
1676+
ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
1677+
{
1678+
BlockCommandComment *BCC;
1679+
ParagraphComment *PC;
1680+
ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
1681+
ASSERT_TRUE(HasChildCount(PC, 0));
1682+
ASSERT_TRUE(BCC->getNumArgs() == 1);
1683+
ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header: ");
1684+
}
1685+
}
1686+
}
1687+
1688+
TEST_F(CommentParserTest, ParCommandHasArg3) {
1689+
const char *Sources[] = {
1690+
("/// @par Paragraph header:\n"
1691+
"/// Paragraph body"),
1692+
("/// @par Paragraph header:\r\n"
1693+
"/// Paragraph body"),
1694+
("/// @par Paragraph header:\n\r"
1695+
"/// Paragraph body"),
1696+
};
1697+
1698+
for (size_t i = 0, e = std::size(Sources); i != e; i++) {
1699+
FullComment *FC = parseString(Sources[i]);
1700+
ASSERT_TRUE(HasChildCount(FC, 2));
1701+
1702+
ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
1703+
{
1704+
BlockCommandComment *BCC;
1705+
ParagraphComment *PC;
1706+
TextComment *TC;
1707+
ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
1708+
ASSERT_TRUE(HasChildCount(PC, 1));
1709+
ASSERT_TRUE(BCC->getNumArgs() == 1);
1710+
ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
1711+
ASSERT_TRUE(GetChildAt(PC, 0, TC));
1712+
ASSERT_TRUE(TC->getText() == " Paragraph body");
1713+
}
1714+
}
1715+
}
1716+
1717+
TEST_F(CommentParserTest, ParCommandHasArg4) {
1718+
const char *Sources[] = {
1719+
("/// @par Paragraph header:\n"
1720+
"/// Paragraph body1\n"
1721+
"/// Paragraph body2"),
1722+
("/// @par Paragraph header:\r\n"
1723+
"/// Paragraph body1\n"
1724+
"/// Paragraph body2"),
1725+
("/// @par Paragraph header:\n\r"
1726+
"/// Paragraph body1\n"
1727+
"/// Paragraph body2"),
1728+
};
1729+
1730+
for (size_t i = 0, e = std::size(Sources); i != e; i++) {
1731+
FullComment *FC = parseString(Sources[i]);
1732+
ASSERT_TRUE(HasChildCount(FC, 2));
1733+
1734+
ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
1735+
{
1736+
BlockCommandComment *BCC;
1737+
ParagraphComment *PC;
1738+
TextComment *TC;
1739+
ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
1740+
ASSERT_TRUE(HasChildCount(PC, 2));
1741+
ASSERT_TRUE(BCC->getNumArgs() == 1);
1742+
ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
1743+
ASSERT_TRUE(GetChildAt(PC, 0, TC));
1744+
ASSERT_TRUE(TC->getText() == " Paragraph body1");
1745+
ASSERT_TRUE(GetChildAt(PC, 1, TC));
1746+
ASSERT_TRUE(TC->getText() == " Paragraph body2");
1747+
}
1748+
}
1749+
}
1750+
1751+
TEST_F(CommentParserTest, ParCommandHasArg5) {
1752+
const char *Sources[] = {
1753+
("/// @par \n"
1754+
"/// Paragraphs with no text before newline have no heading"),
1755+
("/// @par \r\n"
1756+
"/// Paragraphs with no text before newline have no heading"),
1757+
("/// @par \n\r"
1758+
"/// Paragraphs with no text before newline have no heading"),
1759+
};
1760+
1761+
for (size_t i = 0, e = std::size(Sources); i != e; i++) {
1762+
FullComment *FC = parseString(Sources[i]);
1763+
ASSERT_TRUE(HasChildCount(FC, 2));
1764+
1765+
ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
1766+
{
1767+
BlockCommandComment *BCC;
1768+
ParagraphComment *PC;
1769+
TextComment *TC;
1770+
ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
1771+
ASSERT_TRUE(HasChildCount(PC, 1));
1772+
ASSERT_TRUE(BCC->getNumArgs() == 0);
1773+
ASSERT_TRUE(GetChildAt(PC, 0, TC));
1774+
ASSERT_TRUE(TC->getText() ==
1775+
"Paragraphs with no text before newline have no heading");
1776+
}
1777+
}
1778+
}
16421779

16431780
} // unnamed namespace
16441781

clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
3232
Record &Tag = *Tags[i];
3333
OS << " { "
3434
<< "\"" << Tag.getValueAsString("Name") << "\", "
35-
<< "\"" << Tag.getValueAsString("EndCommandName") << "\", "
36-
<< i << ", "
35+
<< "\"" << Tag.getValueAsString("EndCommandName") << "\", " << i << ", "
3736
<< Tag.getValueAsInt("NumArgs") << ", "
3837
<< Tag.getValueAsBit("IsInlineCommand") << ", "
3938
<< Tag.getValueAsBit("IsBlockCommand") << ", "
@@ -44,6 +43,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
4443
<< Tag.getValueAsBit("IsThrowsCommand") << ", "
4544
<< Tag.getValueAsBit("IsDeprecatedCommand") << ", "
4645
<< Tag.getValueAsBit("IsHeaderfileCommand") << ", "
46+
<< Tag.getValueAsBit("IsParCommand") << ", "
4747
<< Tag.getValueAsBit("IsEmptyParagraphAllowed") << ", "
4848
<< Tag.getValueAsBit("IsVerbatimBlockCommand") << ", "
4949
<< Tag.getValueAsBit("IsVerbatimBlockEndCommand") << ", "
@@ -52,8 +52,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
5252
<< Tag.getValueAsBit("IsFunctionDeclarationCommand") << ", "
5353
<< Tag.getValueAsBit("IsRecordLikeDetailCommand") << ", "
5454
<< Tag.getValueAsBit("IsRecordLikeDeclarationCommand") << ", "
55-
<< /* IsUnknownCommand = */ "0"
56-
<< " }";
55+
<< /* IsUnknownCommand = */ "0" << " }";
5756
if (i + 1 != e)
5857
OS << ",";
5958
OS << "\n";

0 commit comments

Comments
 (0)