-
Notifications
You must be signed in to change notification settings - Fork 14k
[Clang][Comments] Support for parsing headers in Doxygen \par commands #91100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang][Comments] Support for parsing headers in Doxygen \par commands #91100
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang Author: hdoc (hdoc) ChangesBackgroundDoxygen's In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code:
As we can see above, the optional paragraph heading ( For documentation generators like hdoc, it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible. ChangeThis change parses
Full diff: https://github.com/llvm/llvm-project/pull/91100.diff 7 Files Affected:
diff --git a/clang/include/clang/AST/CommentCommandTraits.h b/clang/include/clang/AST/CommentCommandTraits.h
index 0c3254d84eb000..78c484fff3aede 100644
--- a/clang/include/clang/AST/CommentCommandTraits.h
+++ b/clang/include/clang/AST/CommentCommandTraits.h
@@ -88,6 +88,10 @@ struct CommandInfo {
LLVM_PREFERRED_TYPE(bool)
unsigned IsHeaderfileCommand : 1;
+ /// True if this is a \\par command.
+ LLVM_PREFERRED_TYPE(bool)
+ unsigned IsParCommand : 1;
+
/// True if we don't want to warn about this command being passed an empty
/// paragraph. Meaningful only for block commands.
LLVM_PREFERRED_TYPE(bool)
diff --git a/clang/include/clang/AST/CommentCommands.td b/clang/include/clang/AST/CommentCommands.td
index e839031752cdd8..5fd687b0d8991f 100644
--- a/clang/include/clang/AST/CommentCommands.td
+++ b/clang/include/clang/AST/CommentCommands.td
@@ -18,6 +18,7 @@ class Command<string name> {
bit IsThrowsCommand = 0;
bit IsDeprecatedCommand = 0;
bit IsHeaderfileCommand = 0;
+ bit IsParCommand = 0;
bit IsEmptyParagraphAllowed = 0;
@@ -156,7 +157,7 @@ def Date : BlockCommand<"date">;
def Invariant : BlockCommand<"invariant">;
def Li : BlockCommand<"li">;
def Note : BlockCommand<"note">;
-def Par : BlockCommand<"par">;
+def Par : BlockCommand<"par"> { let IsParCommand = 1; let NumArgs = 1; }
def Post : BlockCommand<"post">;
def Pre : BlockCommand<"pre">;
def Remark : BlockCommand<"remark">;
diff --git a/clang/include/clang/AST/CommentParser.h b/clang/include/clang/AST/CommentParser.h
index e11e818b1af0a1..b5f1c6c19f0ce7 100644
--- a/clang/include/clang/AST/CommentParser.h
+++ b/clang/include/clang/AST/CommentParser.h
@@ -100,6 +100,9 @@ class Parser {
ArrayRef<Comment::Argument>
parseCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
+ ArrayRef<Comment::Argument>
+ parseParCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
+
BlockCommandComment *parseBlockCommand();
InlineCommandComment *parseInlineCommand();
@@ -118,4 +121,3 @@ class Parser {
} // end namespace clang
#endif
-
diff --git a/clang/lib/AST/CommentParser.cpp b/clang/lib/AST/CommentParser.cpp
index 8adfd85d0160c3..54760f5ba932eb 100644
--- a/clang/lib/AST/CommentParser.cpp
+++ b/clang/lib/AST/CommentParser.cpp
@@ -149,6 +149,63 @@ class TextTokenRetokenizer {
addToken();
}
+ /// Check if this line starts with @par or \par
+ bool startsWithParCommand() {
+ unsigned Offset = 1;
+
+ /// Skip all whitespace characters at the beginning.
+ /// This needs to backtrack because Pos has already advanced past the
+ /// actual \par or @par command by the time this function is called.
+ while (isWhitespace(*(Pos.BufferPtr - Offset)))
+ Offset++;
+
+ /// Check if next four characters are \par or @par
+ llvm::StringRef LineStart(Pos.BufferPtr - 5, 4);
+ return LineStart.starts_with("\\par") || LineStart.starts_with("@par");
+ }
+
+ /// Extract a par command argument-header.
+ bool lexParHeading(Token &Tok) {
+ if (isEnd())
+ return false;
+
+ Position SavedPos = Pos;
+
+ consumeWhitespace();
+ SmallString<32> WordText;
+ const char *WordBegin = Pos.BufferPtr;
+ SourceLocation Loc = getSourceLocation();
+
+ if (!startsWithParCommand())
+ return false;
+
+ // Read until the end of this token, which is effectively the end of the
+ // line This gets us the content of the par header, if there is one.
+ while (!isEnd()) {
+ WordText.push_back(peek());
+ if (Pos.BufferPtr + 1 == Pos.BufferEnd) {
+ consumeChar();
+ break;
+ } else {
+ consumeChar();
+ }
+ }
+
+ const unsigned Length = WordText.size();
+ if (Length == 0) {
+ Pos = SavedPos;
+ return false;
+ }
+
+ char *TextPtr = Allocator.Allocate<char>(Length + 1);
+
+ memcpy(TextPtr, WordText.c_str(), Length + 1);
+ StringRef Text = StringRef(TextPtr, Length);
+
+ formTokenWithChars(Tok, Loc, WordBegin, Length, Text);
+ return true;
+ }
+
/// Extract a word -- sequence of non-whitespace characters.
bool lexWord(Token &Tok) {
if (isEnd())
@@ -304,6 +361,23 @@ Parser::parseCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs) {
return llvm::ArrayRef(Args, ParsedArgs);
}
+ArrayRef<Comment::Argument>
+Parser::parseParCommandArgs(TextTokenRetokenizer &Retokenizer,
+ unsigned NumArgs) {
+ auto *Args = new (Allocator.Allocate<Comment::Argument>(NumArgs))
+ Comment::Argument[NumArgs];
+ unsigned ParsedArgs = 0;
+ Token Arg;
+
+ while (ParsedArgs < NumArgs && Retokenizer.lexParHeading(Arg)) {
+ Args[ParsedArgs] = Comment::Argument{
+ SourceRange(Arg.getLocation(), Arg.getEndLocation()), Arg.getText()};
+ ParsedArgs++;
+ }
+
+ return llvm::ArrayRef(Args, ParsedArgs);
+}
+
BlockCommandComment *Parser::parseBlockCommand() {
assert(Tok.is(tok::backslash_command) || Tok.is(tok::at_command));
@@ -356,6 +430,9 @@ BlockCommandComment *Parser::parseBlockCommand() {
parseParamCommandArgs(PC, Retokenizer);
else if (TPC)
parseTParamCommandArgs(TPC, Retokenizer);
+ else if (Info->IsParCommand)
+ S.actOnBlockCommandArgs(BC,
+ parseParCommandArgs(Retokenizer, Info->NumArgs));
else
S.actOnBlockCommandArgs(BC, parseCommandArgs(Retokenizer, Info->NumArgs));
diff --git a/clang/test/Index/comment-misc-tags.m b/clang/test/Index/comment-misc-tags.m
index 47ee9d9aa392ab..6d018dbfcf193d 100644
--- a/clang/test/Index/comment-misc-tags.m
+++ b/clang/test/Index/comment-misc-tags.m
@@ -91,18 +91,16 @@ @interface IOCommandGate
struct Test {int filler;};
-// CHECK: (CXComment_BlockCommand CommandName=[par]
+// CHECK: (CXComment_BlockCommand CommandName=[par] Arg[0]=User defined paragraph:
// CHECK-NEXT: (CXComment_Paragraph
-// CHECK-NEXT: (CXComment_Text Text=[ User defined paragraph:] HasTrailingNewline)
// CHECK-NEXT: (CXComment_Text Text=[ Contents of the paragraph.])))
// CHECK: (CXComment_BlockCommand CommandName=[par]
// CHECK-NEXT: (CXComment_Paragraph
-// CHECK-NEXT: (CXComment_Text Text=[ New paragraph under the same heading.])))
+// CHECK-NEXT: (CXComment_Text Text=[New paragraph under the same heading.])))
// CHECK: (CXComment_BlockCommand CommandName=[note]
// CHECK-NEXT: (CXComment_Paragraph
// CHECK-NEXT: (CXComment_Text Text=[ This note consists of two paragraphs.] HasTrailingNewline)
// CHECK-NEXT: (CXComment_Text Text=[ This is the first paragraph.])))
// CHECK: (CXComment_BlockCommand CommandName=[par]
// CHECK-NEXT: (CXComment_Paragraph
-// CHECK-NEXT: (CXComment_Text Text=[ And this is the second paragraph.])))
-
+// CHECK-NEXT: (CXComment_Text Text=[And this is the second paragraph.])))
diff --git a/clang/unittests/AST/CommentParser.cpp b/clang/unittests/AST/CommentParser.cpp
index c3479672ae2a3c..d9150a848fcaf3 100644
--- a/clang/unittests/AST/CommentParser.cpp
+++ b/clang/unittests/AST/CommentParser.cpp
@@ -1427,8 +1427,145 @@ TEST_F(CommentParserTest, Deprecated) {
}
}
+TEST_F(CommentParserTest, ParCommandHasArg1) {
+ const char *Sources[] = {
+ "/// @par Paragraph header:", "/// @par Paragraph header:\n",
+ "/// @par Paragraph header:\r\n", "/// @par Paragraph header:\n\r",
+ "/** @par Paragraph header:*/",
+ };
+
+ for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+ FullComment *FC = parseString(Sources[i]);
+ ASSERT_TRUE(HasChildCount(FC, 2));
+
+ ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+ {
+ BlockCommandComment *BCC;
+ ParagraphComment *PC;
+ ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+ ASSERT_TRUE(HasChildCount(PC, 0));
+ ASSERT_TRUE(BCC->getNumArgs() == 1);
+ ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+ }
+ }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg2) {
+ const char *Sources[] = {
+ "/// @par Paragraph header: ", "/// @par Paragraph header: \n",
+ "/// @par Paragraph header: \r\n", "/// @par Paragraph header: \n\r",
+ "/** @par Paragraph header: */",
+ };
+
+ for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+ FullComment *FC = parseString(Sources[i]);
+ ASSERT_TRUE(HasChildCount(FC, 2));
+
+ ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+ {
+ BlockCommandComment *BCC;
+ ParagraphComment *PC;
+ ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+ ASSERT_TRUE(HasChildCount(PC, 0));
+ ASSERT_TRUE(BCC->getNumArgs() == 1);
+ ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header: ");
+ }
+ }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg3) {
+ const char *Sources[] = {
+ ("/// @par Paragraph header:\n"
+ "/// Paragraph body"),
+ ("/// @par Paragraph header:\r\n"
+ "/// Paragraph body"),
+ ("/// @par Paragraph header:\n\r"
+ "/// Paragraph body"),
+ };
+
+ for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+ FullComment *FC = parseString(Sources[i]);
+ ASSERT_TRUE(HasChildCount(FC, 2));
+
+ ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+ {
+ BlockCommandComment *BCC;
+ ParagraphComment *PC;
+ TextComment *TC;
+ ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+ ASSERT_TRUE(HasChildCount(PC, 1));
+ ASSERT_TRUE(BCC->getNumArgs() == 1);
+ ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+ ASSERT_TRUE(GetChildAt(PC, 0, TC));
+ ASSERT_TRUE(TC->getText() == " Paragraph body");
+ }
+ }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg4) {
+ const char *Sources[] = {
+ ("/// @par Paragraph header:\n"
+ "/// Paragraph body1\n"
+ "/// Paragraph body2"),
+ ("/// @par Paragraph header:\r\n"
+ "/// Paragraph body1\n"
+ "/// Paragraph body2"),
+ ("/// @par Paragraph header:\n\r"
+ "/// Paragraph body1\n"
+ "/// Paragraph body2"),
+ };
+
+ for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+ FullComment *FC = parseString(Sources[i]);
+ ASSERT_TRUE(HasChildCount(FC, 2));
+
+ ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+ {
+ BlockCommandComment *BCC;
+ ParagraphComment *PC;
+ TextComment *TC;
+ ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+ ASSERT_TRUE(HasChildCount(PC, 2));
+ ASSERT_TRUE(BCC->getNumArgs() == 1);
+ ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+ ASSERT_TRUE(GetChildAt(PC, 0, TC));
+ ASSERT_TRUE(TC->getText() == " Paragraph body1");
+ ASSERT_TRUE(GetChildAt(PC, 1, TC));
+ ASSERT_TRUE(TC->getText() == " Paragraph body2");
+ }
+ }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg5) {
+ const char *Sources[] = {
+ ("/// @par \n"
+ "/// Paragraphs with no text before newline have no heading"),
+ ("/// @par \r\n"
+ "/// Paragraphs with no text before newline have no heading"),
+ ("/// @par \n\r"
+ "/// Paragraphs with no text before newline have no heading"),
+ };
+
+ for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+ FullComment *FC = parseString(Sources[i]);
+ ASSERT_TRUE(HasChildCount(FC, 2));
+
+ ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+ {
+ BlockCommandComment *BCC;
+ ParagraphComment *PC;
+ TextComment *TC;
+ ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+ ASSERT_TRUE(HasChildCount(PC, 1));
+ ASSERT_TRUE(BCC->getNumArgs() == 0);
+ ASSERT_TRUE(GetChildAt(PC, 0, TC));
+ ASSERT_TRUE(TC->getText() ==
+ "Paragraphs with no text before newline have no heading");
+ }
+ }
+}
+
} // unnamed namespace
} // end namespace comments
} // end namespace clang
-
diff --git a/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp b/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
index a113b02e19995d..07b26dc2f6b8be 100644
--- a/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
+++ b/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
@@ -44,6 +44,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
<< Tag.getValueAsBit("IsThrowsCommand") << ", "
<< Tag.getValueAsBit("IsDeprecatedCommand") << ", "
<< Tag.getValueAsBit("IsHeaderfileCommand") << ", "
+ << Tag.getValueAsBit("IsParCommand") << ", "
<< Tag.getValueAsBit("IsEmptyParagraphAllowed") << ", "
<< Tag.getValueAsBit("IsVerbatimBlockCommand") << ", "
<< Tag.getValueAsBit("IsVerbatimBlockEndCommand") << ", "
|
ff4e6f3
to
2b18453
Compare
84dd803
to
362080b
Compare
Ping. This is rebased on latest main (which integrates the other comment parser changes we've made) and should be ready to go. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
c6d4b64
to
ecadf98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Please be sure to add a release note to clang/docs/ReleaseNotes.rst so users know about the new functionality.
Not sure why a failure is being triggered here as we didn't touch this exact section. Our changes affect code a little further down in the file.
487cb53
to
7db1807
Compare
@AaronBallman thanks for the reminder :) the release notes blurb has been added |
@hdoc Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested Please check whether problems have been caused by your change specifically, as How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
llvm#91100) ### Background Doxygen's `\par` command ([link](https://www.doxygen.nl/manual/commands.html#cmdpar)) has an optional argument, which denotes the header of the paragraph started by a given `\par` command. In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code: ``` $ cat test.cpp /// \par User defined paragraph: /// Contents of the paragraph. /// /// \par /// New paragraph under the same heading. /// /// \par /// A second paragraph. class A {}; $ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp `-CXXRecordDecl 0x1530f3a78 <test.cpp:11:1, col:10> col:7 class A definition |-FullComment 0x1530fea38 <line:2:4, line:9:23> | |-ParagraphComment 0x1530fe7e0 <line:2:4> | | `-TextComment 0x1530fe7b8 <col:4> Text=" " | |-BlockCommandComment 0x1530fe800 <col:5, line:3:30> Name="par" | | `-ParagraphComment 0x1530fe878 <line:2:9, line:3:30> | | |-TextComment 0x1530fe828 <line:2:9, col:32> Text=" User defined paragraph:" | | `-TextComment 0x1530fe848 <line:3:4, col:30> Text=" Contents of the paragraph." | |-ParagraphComment 0x1530fe8c0 <line:5:4> | | `-TextComment 0x1530fe898 <col:4> Text=" " | |-BlockCommandComment 0x1530fe8e0 <col:5, line:6:41> Name="par" | | `-ParagraphComment 0x1530fe930 <col:4, col:41> | | `-TextComment 0x1530fe908 <col:4, col:41> Text=" New paragraph under the same heading." | |-ParagraphComment 0x1530fe978 <line:8:4> | | `-TextComment 0x1530fe950 <col:4> Text=" " | `-BlockCommandComment 0x1530fe998 <col:5, line:9:23> Name="par" | `-ParagraphComment 0x1530fe9e8 <col:4, col:23> | `-TextComment 0x1530fe9c0 <col:4, col:23> Text=" A second paragraph." `-CXXRecordDecl 0x1530f3bb0 <line:11:1, col:7> col:7 implicit class A ``` As we can see above, the optional paragraph heading (`"User defined paragraph"`) is not an argument of the `\par` `BlockCommandComment`, but instead a child `TextComment`. For documentation generators like [hdoc](https://hdoc.io/), it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible. ### Change This change parses `\par` command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition: - AST unit tests are defined to test this functionality when an argument is present, isn't present, with additional spacing, etc. - TableGen is updated with an `IsParCommand` to support this functionality - `lit` tests are updated where needed
Background
Doxygen's
\par
command (link) has an optional argument, which denotes the header of the paragraph started by a given\par
command.In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code:
As we can see above, the optional paragraph heading (
"User defined paragraph"
) is not an argument of the\par
BlockCommandComment
, but instead a childTextComment
.For documentation generators like hdoc, it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible.
Change
This change parses
\par
command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition:IsParCommand
to support this functionalitylit
tests are updated where needed