Skip to content

[C++20][Modules] Implement P1857R3 Modules Dependency Discovery #107168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yronglin
Copy link
Contributor

@yronglin yronglin commented Sep 4, 2024

Supported features

This PR implement the following papers:
P1857R3 Modules Dependency Discovery.
P3034R1 Module Declarations Shouldn’t be Macros.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are
    • at the start of a logical line, or
    • preceded by an export at the start of the logical line.
  • Are followed by an identifier pp token (before macro expansion), or
    • <, ", or : (but not ::) pp tokens for import, or
    • ; for module
      Otherwise the token is treated as an identifier.

Additionally:

  • The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.
  • The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.
  • [TODO] A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
  • Preprocessor conditionals shall not span a module declaration.

[TODO] Need add more test.

Implementation details

  • Introduce an new tok::annot_module_name token kind. It's holds token sequence like: A.B.C. It can be both used to in ObjC @import module names, C++ module names and clang module names.
  • Since the wording updates in P1857R0 of the C++ import/module directive, in preprocessing phase, we convert the import/module contextual keyword into a keyword when appropriate and treat it as a processing directives.

Copy link

github-actions bot commented Sep 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

yronglin added a commit that referenced this pull request Apr 16, 2025
…tructures to `IdentifierLoc` (#135808)

I found this issue when I working on
#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.
 
This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2025
…like data structures to `IdentifierLoc` (#135808)

I found this issue when I working on
llvm/llvm-project#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.

This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…tructures to `IdentifierLoc` (llvm#135808)

I found this issue when I working on
llvm#107168.

Currently we have many similiar data structures like:
 - `std::pair<IdentifierInfo *, SourceLocation>`.
 - Element type of `ModuleIdPath`.
 - `IdentifierLocPair`.
 - `IdentifierLoc`.
 
This PR unify these data structures to `IdentifierLoc`, moved
`IdentifierLoc` definition to SourceLocation.h, and deleted other
similer data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
yronglin added a commit that referenced this pull request Apr 17, 2025
… data structures to `IdentifierLoc` (#136077)

This PR reland #135808, fixed
some missed changes in LLDB.
I found this issue when I working on
#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 17, 2025
…` pair-like data structures to `IdentifierLoc` (#136077)

This PR reland llvm/llvm-project#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm/llvm-project#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
… data structures to `IdentifierLoc` (llvm#136077)

This PR reland llvm#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
… data structures to `IdentifierLoc` (llvm#136077)

This PR reland llvm#135808, fixed
some missed changes in LLDB.
I found this issue when I working on
llvm#107168.

Currently we have many similiar data structures like:
- std::pair<IdentifierInfo *, SourceLocation>.
- Element type of ModuleIdPath.
- IdentifierLocPair.
- IdentifierLoc.

This PR unify these data structures to IdentifierLoc, moved
IdentifierLoc definition to SourceLocation.h, and deleted other similer
data structures.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin yronglin force-pushed the modules_dependency_discovery branch from dbc377e to d300a2b Compare May 31, 2025 07:36
@yronglin yronglin marked this pull request as ready for review May 31, 2025 07:43
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels May 31, 2025
@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang-modules

Author: None (yronglin)

Changes

Implement P1857R3 Modules Dependency Discovery.

  • Handle C++ module and import directive like other preprocessor directive.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are

    • ✅at the start of a logical line, or

    • ✅preceded by an export at the start of the logical line.

  • Are followed by an identifier pp token (before macro expansion), or

    • ✅<, ", or : (but not ::) pp tokens for import, or

    • ✅; for module

Otherwise the token is treated as an identifier.

Additionally:

  • ✅The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.

  • ✅The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.

  • ❌**[TODO]** A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)

  • ✅Preprocessor conditionals shall not span a module declaration.

Need add more test


Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff

43 Files Affected:

  • (modified) clang/examples/AnnotateFunctions/AnnotateFunctions.cpp (+1-1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+15-1)
  • (modified) clang/include/clang/Basic/DiagnosticParseKinds.td (+2-4)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+22-4)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+1-1)
  • (modified) clang/include/clang/Lex/CodeCompletionHandler.h (+8)
  • (modified) clang/include/clang/Lex/Lexer.h (+5-5)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+95-26)
  • (modified) clang/include/clang/Lex/Token.h (+7)
  • (modified) clang/include/clang/Lex/TokenLexer.h (+3-4)
  • (modified) clang/include/clang/Parse/Parser.h (+2)
  • (modified) clang/include/clang/Sema/Sema.h (+4-2)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+7-3)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+8-1)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+20-8)
  • (modified) clang/lib/Lex/Lexer.cpp (+46-18)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+264-5)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+15-17)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+171-188)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+5-3)
  • (modified) clang/lib/Lex/TokenLexer.cpp (+7-6)
  • (modified) clang/lib/Parse/Parser.cpp (+30-63)
  • (modified) clang/lib/Sema/SemaModule.cpp (+30-45)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+1-1)
  • (modified) clang/test/CXX/basic/basic.link/p1.cpp (+109-36)
  • (modified) clang/test/CXX/basic/basic.link/p3.cpp (+40-27)
  • (modified) clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp (+56-26)
  • (modified) clang/test/CXX/lex/lex.pptoken/p3-2a.cpp (+10-5)
  • (modified) clang/test/CXX/module/basic/basic.def.odr/p6.cppm (+134-40)
  • (modified) clang/test/CXX/module/basic/basic.link/module-declaration.cpp (+35-29)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm (+27-11)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm (+18-21)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p1.cpp (+30-14)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p5.cpp (+48-17)
  • (modified) clang/test/CXX/module/module.interface/p1.cpp (+24-18)
  • (modified) clang/test/CXX/module/module.interface/p2.cpp (+12-14)
  • (modified) clang/test/CXX/module/module.unit/p8.cpp (+28-20)
  • (modified) clang/test/Modules/pr121066.cpp (+1-2)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+5-5)
  • (modified) clang/unittests/Lex/ModuleDeclStateTest.cpp (+1-1)
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
index d872020c2d8a3..22a3eb97f938b 100644
--- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
+++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
@@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler {
     Token Tok;
     PP.LexUnexpandedToken(Tok);
     if (Tok.isNot(tok::eod))
-      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma";
+      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma";
 
     if (HandledDecl) {
       DiagnosticsEngine &D = PP.getDiagnostics();
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 723f5d48b4f5f..f975a63b369b5 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error<
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
+def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn<
+  "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>;
 
 def ext_pp_comma_expr : Extension<"comma operator in operand of #if">;
 def ext_pp_bad_vaargs_use : Extension<
@@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning<
 def ext_named_variadic_macro : Extension<
   "named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
 def err_embedded_directive : Error<
-  "embedding a #%0 directive within macro arguments is not supported">;
+  "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">;
 def ext_embedded_directive : Extension<
   "embedding a directive within macro arguments has undefined behavior">,
   InGroup<DiagGroup<"embedded-directive">>;
@@ -983,6 +985,18 @@ def warn_module_conflict : Warning<
   InGroup<ModuleConflict>;
 
 // C++20 modules
+def err_pp_expected_module_name_or_header_name : Error<
+  "expected module name or header name">;
+def err_pp_expected_semi_after_module_or_import : Error<
+  "'%select{module|import}0' directive must end with a ';' on the same line">;
+def err_module_decl_in_header : Error<
+  "module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl : Error<
+  "preprocessor conditionals shall not span a module declaration">;
+def err_pp_module_expected_ident : Error<
+  "expected a module name after '%select{module|import}0'">;
+def err_pp_unsupported_module_partition : Error<
+  "module partitions are only supported for C++20 onwards">;
 def err_header_import_semi_in_macro : Error<
   "semicolon terminating header import declaration cannot be produced "
   "by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 3aa36ad59d0b9..c06e2f090b429 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1760,8 +1760,8 @@ def ext_bit_int : Extension<
 } // end of Parse Issue category.
 
 let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
-  "module declaration can only appear at the top level">;
+def err_unexpected_module_import_decl : Error<
+  "%select{module|import}0 declaration can only appear at the top level">;
 def err_module_expected_ident : Error<
   "expected a module name after '%select{module|import}0'">;
 def err_attribute_not_module_attr : Error<
@@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error<
 def err_private_module_fragment_expected_semi : Error<
   "expected ';' after private module fragment declaration">;
 def err_missing_before_module_end : Error<"expected %0 at end of module">;
-def err_unsupported_module_partition : Error<
-  "module partitions are only supported for C++20 onwards">;
 def err_import_not_allowed_here : Error<
   "imports must immediately follow the module declaration">;
 def err_partition_import_outside_module : Error<
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54540193cfcc0..add6c6ac629a1 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsModulesImport : 1;
 
+  // True if this is the 'module' contextual keyword.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsModulesDecl : 1;
+
   // True if this is a mangled OpenMP variant name.
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsMangledOpenMPVariantName : 1;
@@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
         IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
         IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
         RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
-        IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
-        IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+        IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+        IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+        IsKeywordInCpp(false) {}
 
 public:
   IdentifierInfo(const IdentifierInfo &) = delete;
@@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
       RecomputeNeedsHandleIdentifier();
   }
 
+  /// Determine whether this is the contextual keyword \c module.
+  bool isModulesDeclaration() const { return IsModulesDecl; }
+
+  /// Set whether this identifier is the contextual keyword \c module.
+  void setModulesDeclaration(bool I) {
+    IsModulesDecl = I;
+    if (I)
+      NeedsHandleIdentifier = true;
+    else
+      RecomputeNeedsHandleIdentifier();
+  }
+
   /// Determine whether this is the mangled name of an OpenMP variant.
   bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; }
 
@@ -745,10 +762,11 @@ class IdentifierTable {
     // contents.
     II->Entry = &Entry;
 
-    // If this is the 'import' contextual keyword, mark it as such.
+    // If this is the 'import' or 'module' contextual keyword, mark it as such.
     if (Name == "import")
       II->setModulesImport(true);
-
+    else if (Name == "module")
+      II->setModulesDeclaration(true);
     return *II;
   }
 
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94e72fea56a68..7750c84dbef78 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,9 @@ PPKEYWORD(pragma)
 // C23 & C++26 #embed
 PPKEYWORD(embed)
 
+// C++20 Module Directive
+PPKEYWORD(module)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -1023,6 +1026,9 @@ ANNOTATION(module_include)
 ANNOTATION(module_begin)
 ANNOTATION(module_end)
 
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
 // Annotation for a header_name token that has been looked up and transformed
 // into the name of a header unit.
 ANNOTATION(header_unit)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 0ae490f0e8073..112d3b00160fd 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader {
   /// load it.
   ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
                                                  SourceLocation ImportLoc,
-                                                 SourceLocation ModuleNameLoc,
+                                                 SourceRange ModuleNameRange,
                                                  bool IsInclusionDirective);
 
   /// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
 #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
 
 class IdentifierInfo;
 class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
 
 /// Callback handler that receives notifications when performing code
 /// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
   /// file where we expect natural language, e.g., a comment, string, or
   /// \#error directive.
   virtual void CodeCompleteNaturalLanguage() { }
+
+  /// Callback invoked when performing code completion inside the module name
+  /// part of an import directive.
+  virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+                                        ModuleIdPath Path) {}
 };
 
 }
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
   //===--------------------------------------------------------------------===//
   // Context that changes as the file is lexed.
   // NOTE: any state that mutates when in raw mode must have save/restore code
-  // in Lexer::isNextPPTokenLParen.
+  // in Lexer::peekNextPPToken.
 
   // BufferPtr - Current pointer into the buffer.  This is the next character
   // to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
     BufferPtr = TokEnd;
   }
 
-  /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
-  /// tok::l_paren token, 0 if it is something else and 2 if there are no more
-  /// tokens in the buffer controlled by this lexer.
-  unsigned isNextPPTokenLParen();
+  /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+  /// buffer controlled by this lexer, otherwise return the next unexpanded
+  /// token.
+  std::optional<Token> peekNextPPToken();
 
   //===--------------------------------------------------------------------===//
   // Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index f2dfd3a349b8b..79a75a116c418 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -82,6 +83,7 @@ class PreprocessorLexer;
 class PreprocessorOptions;
 class ScratchBuffer;
 class TargetInfo;
+class ModuleNameLoc;
 
 namespace Builtin {
 class Context;
@@ -332,8 +334,9 @@ class Preprocessor {
   /// lexed, if any.
   SourceLocation ModuleImportLoc;
 
-  /// The import path for named module that we're currently processing.
-  SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+  /// The source location of the \c module contextual keyword we just
+  /// lexed, if any.
+  SourceLocation ModuleDeclLoc;
 
   llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
   unsigned CheckPointCounter = 0;
@@ -344,6 +347,21 @@ class Preprocessor {
   /// Whether the last token we lexed was an '@'.
   bool LastTokenWasAt = false;
 
+  /// Whether we're importing a standard C++20 named Modules.
+  bool ImportingCXXNamedModules = false;
+
+  /// Whether we're declaring a standard C++20 named Modules.
+  bool DeclaringCXXNamedModules = false;
+
+  struct ExportContextualKeywordInfo {
+    Token ExportTok;
+    bool TokAtPhysicalStartOfLine;
+  };
+
+  /// Whether the last token we lexed was an 'export' keyword.
+  std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword =
+      std::nullopt;
+
   /// A position within a C++20 import-seq.
   class StdCXXImportSeq {
   public:
@@ -547,12 +565,7 @@ class Preprocessor {
         reset();
     }
 
-    void handleIdentifier(IdentifierInfo *Identifier) {
-      if (isModuleCandidate() && Identifier)
-        Name += Identifier->getName().str();
-      else if (!isNamedModule())
-        reset();
-    }
+    void handleModuleName(ModuleNameLoc *Path);
 
     void handleColon() {
       if (isModuleCandidate())
@@ -561,13 +574,6 @@ class Preprocessor {
         reset();
     }
 
-    void handlePeriod() {
-      if (isModuleCandidate())
-        Name += ".";
-      else if (!isNamedModule())
-        reset();
-    }
-
     void handleSemi() {
       if (!Name.empty() && isModuleCandidate()) {
         if (State == InterfaceCandidate)
@@ -622,10 +628,6 @@ class Preprocessor {
 
   ModuleDeclSeq ModuleDeclState;
 
-  /// Whether the module import expects an identifier next. Otherwise,
-  /// it expects a '.' or ';'.
-  bool ModuleImportExpectsIdentifier = false;
-
   /// The identifier and source location of the currently-active
   /// \#pragma clang arc_cf_code_audited begin.
   IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1759,6 +1761,19 @@ class Preprocessor {
   /// Lex the parameters for an #embed directive, returns nullopt on error.
   std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
                                                              bool ForHasEmbed);
+  bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+                             SmallVectorImpl<IdentifierLoc> &Path,
+                             bool AllowMacroExpansion = true);
+  void HandleCXXImportDirective(Token Import);
+  void HandleCXXModuleDirective(Token Module);
+  
+  /// Callback invoked when the lexer sees one of export, import or module token
+  /// at the start of a line.
+  ///
+  /// This consumes the import, module directive, modifies the
+  /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+  /// read is the correct one.
+  bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine);
 
   bool LexAfterModuleImport(Token &Result);
   void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
@@ -2282,7 +2297,9 @@ class Preprocessor {
   /// Determine whether the next preprocessor token to be
   /// lexed is a '('.  If so, consume the token and return true, if not, this
   /// method should have no observable side-effect on the lexed tokens.
-  bool isNextPPTokenLParen();
+  bool isNextPPTokenLParen() {
+    return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+  }
 
 private:
   /// Identifiers used for SEH handling in Borland. These are only
@@ -2342,7 +2359,7 @@ class Preprocessor {
   ///
   /// \return The location of the end of the directive (the terminating
   /// newline).
-  SourceLocation CheckEndOfDirective(const char *DirType,
+  SourceLocation CheckEndOfDirective(StringRef DirType,
                                      bool EnableMacros = false);
 
   /// Read and discard all tokens remaining on the current line until
@@ -2424,11 +2441,12 @@ class Preprocessor {
   }
 
   /// If we're importing a standard C++20 Named Modules.
-  bool isInImportingCXXNamedModules() const {
-    // NamedModuleImportPath will be non-empty only if we're importing
-    // Standard C++ named modules.
-    return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
-           !IsAtImport;
+  bool isImportingCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules;
+  }
+
+  bool isDeclaringCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules;
   }
 
   /// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2661,6 +2679,10 @@ class Preprocessor {
 
   void removeCachedMacroExpandedTokensOfLastLexer();
 
+  /// Peek the next token. If so, return the token, if not, this
+  /// method should have no observable side-effect on the lexed tokens.
+  std::optional<Token> peekNextPPToken();
+
   /// After reading "MACRO(", this method is invoked to read all of the formal
   /// arguments specified for the macro invocation.  Returns null on error.
   MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
@@ -3078,6 +3100,53 @@ struct EmbedAnnotationData {
   StringRef FileName;
 };
 
+/// Represents module name annotation data.
+///
+///     module-name:
+///           module-name-qualifier[opt] identifier
+///
+///     partition-name: [C++20]
+///           : module-name-qualifier[opt] identifier
+///
+///     module-name-qualifier
+///           module-name-qualifier[opt] identifier .
+class ModuleNameLoc final
+    : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+  friend TrailingObjects;
+  unsigned NumIdentifierLocs;
+
+  unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+    return getNumIdentifierLocs();
+  }
+
+  ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+    (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>());
+  }
+
+public:
+  static std::string stringFromModuleIdPath(ModuleIdPath Path);
+  static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+  static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path);
+  unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+  ModuleIdPath getModuleIdPath() const {
+    return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()};
+  }
+
+  SourceLocation getBeginLoc() const {
+    return getModuleIdPath().front().getLoc();
+  }
+  SourceLocation getEndLoc() const {
+    auto &Last = getModuleIdPath().back();
+    return Last.getLoc().getLocWithOffset(
+        Last.getIdentifierInfo()->getLength());
+  }
+  SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+
+  std::string str() const;
+  void print(llvm::raw_ostream &OS) const;
+  void dump() const { print(llvm::errs()); }
+};
+
 /// Registry of pragma handlers added by plugins
 using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>;
 
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 4f29fb7d11415..8e81207ddf8d7 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -231,6 +231,9 @@ class Token {
     PtrData = const_cast<char*>(Ptr);
   }
 
+  template <class T> T getAnnotationValueAs() const {
+    return static_cast<T>(getAnnotationValue());
+  }
   void *getAnnotationValue() const {
     assert(isAnnotation() && "Used AnnotVal on non-annotation token");
     return PtrData;
@@ -289,6 +292,10 @@ class Token {
   /// Return the ObjC keyword kind.
   tok::ObjCKeywordKind getObjCKeywordID() const;
 
+  /// Return true if we have an C++20 Modules contextual keyword(export, import
+  /// or module).
+  bool isModuleContextualKeyword(bool AllowExport = true) const;
+
   bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
 
   /// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
   void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
             bool OwnsTokens, bool IsReinject);
 
-  /// If the next token lexed will pop this macro off the
-  /// expansion stack, return 2.  If the next unexpanded token is a '(', return
-  /// 1, otherwise return 0.
-  unsigned isNextTokenLParen() const;
+  /// If the next token lexed will pop this macro off the expansion stack,
+  /// return std::nullopt, otherwise return the next unexpanded token.
+  std::optional<Token> peekNextPPToken() const;
 
   /// Lex and return a token from this macro stream.
   bool Lex(Token &Tok);
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index c4bef4729fd36..a59a99bbac7c6 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler {
                                  unsigned ArgumentIndex) override;
   ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented May 31, 2025

@llvm/pr-subscribers-clang

Author: None (yronglin)

Changes

Implement P1857R3 Modules Dependency Discovery.

  • Handle C++ module and import directive like other preprocessor directive.

At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff:

  • After skipping horizontal whitespace are

    • ✅at the start of a logical line, or

    • ✅preceded by an export at the start of the logical line.

  • Are followed by an identifier pp token (before macro expansion), or

    • ✅<, ", or : (but not ::) pp tokens for import, or

    • ✅; for module

Otherwise the token is treated as an identifier.

Additionally:

  • ✅The entire import or module directive (including the closing ;) must be on a single logical line and for module must not come from an #include.

  • ✅The expansion of macros must not result in an import or module directive introducer that was not there prior to macro expansion.

  • ❌**[TODO]** A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)

  • ✅Preprocessor conditionals shall not span a module declaration.

Need add more test


Patch is 134.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107168.diff

43 Files Affected:

  • (modified) clang/examples/AnnotateFunctions/AnnotateFunctions.cpp (+1-1)
  • (modified) clang/include/clang/Basic/DiagnosticLexKinds.td (+15-1)
  • (modified) clang/include/clang/Basic/DiagnosticParseKinds.td (+2-4)
  • (modified) clang/include/clang/Basic/IdentifierTable.h (+22-4)
  • (modified) clang/include/clang/Basic/TokenKinds.def (+6)
  • (modified) clang/include/clang/Frontend/CompilerInstance.h (+1-1)
  • (modified) clang/include/clang/Lex/CodeCompletionHandler.h (+8)
  • (modified) clang/include/clang/Lex/Lexer.h (+5-5)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+95-26)
  • (modified) clang/include/clang/Lex/Token.h (+7)
  • (modified) clang/include/clang/Lex/TokenLexer.h (+3-4)
  • (modified) clang/include/clang/Parse/Parser.h (+2)
  • (modified) clang/include/clang/Sema/Sema.h (+4-2)
  • (modified) clang/lib/Basic/IdentifierTable.cpp (+3-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+7-3)
  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+8-1)
  • (modified) clang/lib/Lex/DependencyDirectivesScanner.cpp (+20-8)
  • (modified) clang/lib/Lex/Lexer.cpp (+46-18)
  • (modified) clang/lib/Lex/PPDirectives.cpp (+264-5)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+15-17)
  • (modified) clang/lib/Lex/Preprocessor.cpp (+171-188)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+5-3)
  • (modified) clang/lib/Lex/TokenLexer.cpp (+7-6)
  • (modified) clang/lib/Parse/Parser.cpp (+30-63)
  • (modified) clang/lib/Sema/SemaModule.cpp (+30-45)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+1-1)
  • (modified) clang/test/CXX/basic/basic.link/p1.cpp (+109-36)
  • (modified) clang/test/CXX/basic/basic.link/p3.cpp (+40-27)
  • (modified) clang/test/CXX/basic/basic.scope/basic.scope.namespace/p2.cpp (+56-26)
  • (modified) clang/test/CXX/lex/lex.pptoken/p3-2a.cpp (+10-5)
  • (modified) clang/test/CXX/module/basic/basic.def.odr/p6.cppm (+134-40)
  • (modified) clang/test/CXX/module/basic/basic.link/module-declaration.cpp (+35-29)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm (+27-11)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm (+18-21)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p1.cpp (+30-14)
  • (modified) clang/test/CXX/module/dcl.dcl/dcl.module/p5.cpp (+48-17)
  • (modified) clang/test/CXX/module/module.interface/p1.cpp (+24-18)
  • (modified) clang/test/CXX/module/module.interface/p2.cpp (+12-14)
  • (modified) clang/test/CXX/module/module.unit/p8.cpp (+28-20)
  • (modified) clang/test/Modules/pr121066.cpp (+1-2)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp (+1-1)
  • (modified) clang/unittests/Lex/DependencyDirectivesScannerTest.cpp (+5-5)
  • (modified) clang/unittests/Lex/ModuleDeclStateTest.cpp (+1-1)
diff --git a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
index d872020c2d8a3..22a3eb97f938b 100644
--- a/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
+++ b/clang/examples/AnnotateFunctions/AnnotateFunctions.cpp
@@ -65,7 +65,7 @@ class PragmaAnnotateHandler : public PragmaHandler {
     Token Tok;
     PP.LexUnexpandedToken(Tok);
     if (Tok.isNot(tok::eod))
-      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "pragma";
+      PP.Diag(Tok, diag::ext_pp_extra_tokens_at_eol) << "#pragma";
 
     if (HandledDecl) {
       DiagnosticsEngine &D = PP.getDiagnostics();
diff --git a/clang/include/clang/Basic/DiagnosticLexKinds.td b/clang/include/clang/Basic/DiagnosticLexKinds.td
index 723f5d48b4f5f..f975a63b369b5 100644
--- a/clang/include/clang/Basic/DiagnosticLexKinds.td
+++ b/clang/include/clang/Basic/DiagnosticLexKinds.td
@@ -466,6 +466,8 @@ def err_pp_embed_device_file : Error<
 
 def ext_pp_extra_tokens_at_eol : ExtWarn<
   "extra tokens at end of #%0 directive">, InGroup<ExtraTokens>;
+def ext_pp_extra_tokens_at_module_directive_eol : ExtWarn<
+  "extra tokens at end of '%0' directive">, InGroup<ExtraTokens>;
 
 def ext_pp_comma_expr : Extension<"comma operator in operand of #if">;
 def ext_pp_bad_vaargs_use : Extension<
@@ -496,7 +498,7 @@ def warn_cxx98_compat_variadic_macro : Warning<
 def ext_named_variadic_macro : Extension<
   "named variadic macros are a GNU extension">, InGroup<VariadicMacros>;
 def err_embedded_directive : Error<
-  "embedding a #%0 directive within macro arguments is not supported">;
+  "embedding a %select{#|C++ }0%1 directive within macro arguments is not supported">;
 def ext_embedded_directive : Extension<
   "embedding a directive within macro arguments has undefined behavior">,
   InGroup<DiagGroup<"embedded-directive">>;
@@ -983,6 +985,18 @@ def warn_module_conflict : Warning<
   InGroup<ModuleConflict>;
 
 // C++20 modules
+def err_pp_expected_module_name_or_header_name : Error<
+  "expected module name or header name">;
+def err_pp_expected_semi_after_module_or_import : Error<
+  "'%select{module|import}0' directive must end with a ';' on the same line">;
+def err_module_decl_in_header : Error<
+  "module declaration must not come from an #include directive">;
+def err_pp_cond_span_module_decl : Error<
+  "preprocessor conditionals shall not span a module declaration">;
+def err_pp_module_expected_ident : Error<
+  "expected a module name after '%select{module|import}0'">;
+def err_pp_unsupported_module_partition : Error<
+  "module partitions are only supported for C++20 onwards">;
 def err_header_import_semi_in_macro : Error<
   "semicolon terminating header import declaration cannot be produced "
   "by a macro">;
diff --git a/clang/include/clang/Basic/DiagnosticParseKinds.td b/clang/include/clang/Basic/DiagnosticParseKinds.td
index 3aa36ad59d0b9..c06e2f090b429 100644
--- a/clang/include/clang/Basic/DiagnosticParseKinds.td
+++ b/clang/include/clang/Basic/DiagnosticParseKinds.td
@@ -1760,8 +1760,8 @@ def ext_bit_int : Extension<
 } // end of Parse Issue category.
 
 let CategoryName = "Modules Issue" in {
-def err_unexpected_module_decl : Error<
-  "module declaration can only appear at the top level">;
+def err_unexpected_module_import_decl : Error<
+  "%select{module|import}0 declaration can only appear at the top level">;
 def err_module_expected_ident : Error<
   "expected a module name after '%select{module|import}0'">;
 def err_attribute_not_module_attr : Error<
@@ -1782,8 +1782,6 @@ def err_module_fragment_exported : Error<
 def err_private_module_fragment_expected_semi : Error<
   "expected ';' after private module fragment declaration">;
 def err_missing_before_module_end : Error<"expected %0 at end of module">;
-def err_unsupported_module_partition : Error<
-  "module partitions are only supported for C++20 onwards">;
 def err_import_not_allowed_here : Error<
   "imports must immediately follow the module declaration">;
 def err_partition_import_outside_module : Error<
diff --git a/clang/include/clang/Basic/IdentifierTable.h b/clang/include/clang/Basic/IdentifierTable.h
index 54540193cfcc0..add6c6ac629a1 100644
--- a/clang/include/clang/Basic/IdentifierTable.h
+++ b/clang/include/clang/Basic/IdentifierTable.h
@@ -179,6 +179,10 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsModulesImport : 1;
 
+  // True if this is the 'module' contextual keyword.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsModulesDecl : 1;
+
   // True if this is a mangled OpenMP variant name.
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsMangledOpenMPVariantName : 1;
@@ -215,8 +219,9 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
         IsCPPOperatorKeyword(false), NeedsHandleIdentifier(false),
         IsFromAST(false), ChangedAfterLoad(false), FEChangedAfterLoad(false),
         RevertedTokenID(false), OutOfDate(false), IsModulesImport(false),
-        IsMangledOpenMPVariantName(false), IsDeprecatedMacro(false),
-        IsRestrictExpansion(false), IsFinal(false), IsKeywordInCpp(false) {}
+        IsModulesDecl(false), IsMangledOpenMPVariantName(false),
+        IsDeprecatedMacro(false), IsRestrictExpansion(false), IsFinal(false),
+        IsKeywordInCpp(false) {}
 
 public:
   IdentifierInfo(const IdentifierInfo &) = delete;
@@ -528,6 +533,18 @@ class alignas(IdentifierInfoAlignment) IdentifierInfo {
       RecomputeNeedsHandleIdentifier();
   }
 
+  /// Determine whether this is the contextual keyword \c module.
+  bool isModulesDeclaration() const { return IsModulesDecl; }
+
+  /// Set whether this identifier is the contextual keyword \c module.
+  void setModulesDeclaration(bool I) {
+    IsModulesDecl = I;
+    if (I)
+      NeedsHandleIdentifier = true;
+    else
+      RecomputeNeedsHandleIdentifier();
+  }
+
   /// Determine whether this is the mangled name of an OpenMP variant.
   bool isMangledOpenMPVariantName() const { return IsMangledOpenMPVariantName; }
 
@@ -745,10 +762,11 @@ class IdentifierTable {
     // contents.
     II->Entry = &Entry;
 
-    // If this is the 'import' contextual keyword, mark it as such.
+    // If this is the 'import' or 'module' contextual keyword, mark it as such.
     if (Name == "import")
       II->setModulesImport(true);
-
+    else if (Name == "module")
+      II->setModulesDeclaration(true);
     return *II;
   }
 
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 94e72fea56a68..7750c84dbef78 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -133,6 +133,9 @@ PPKEYWORD(pragma)
 // C23 & C++26 #embed
 PPKEYWORD(embed)
 
+// C++20 Module Directive
+PPKEYWORD(module)
+
 // GNU Extensions.
 PPKEYWORD(import)
 PPKEYWORD(include_next)
@@ -1023,6 +1026,9 @@ ANNOTATION(module_include)
 ANNOTATION(module_begin)
 ANNOTATION(module_end)
 
+// Annotations for C++, Clang and Objective-C named modules.
+ANNOTATION(module_name)
+
 // Annotation for a header_name token that has been looked up and transformed
 // into the name of a header unit.
 ANNOTATION(header_unit)
diff --git a/clang/include/clang/Frontend/CompilerInstance.h b/clang/include/clang/Frontend/CompilerInstance.h
index 0ae490f0e8073..112d3b00160fd 100644
--- a/clang/include/clang/Frontend/CompilerInstance.h
+++ b/clang/include/clang/Frontend/CompilerInstance.h
@@ -863,7 +863,7 @@ class CompilerInstance : public ModuleLoader {
   /// load it.
   ModuleLoadResult findOrCompileModuleAndReadAST(StringRef ModuleName,
                                                  SourceLocation ImportLoc,
-                                                 SourceLocation ModuleNameLoc,
+                                                 SourceRange ModuleNameRange,
                                                  bool IsInclusionDirective);
 
   /// Creates a \c CompilerInstance for compiling a module.
diff --git a/clang/include/clang/Lex/CodeCompletionHandler.h b/clang/include/clang/Lex/CodeCompletionHandler.h
index bd3e05a36bb33..2ef29743415ae 100644
--- a/clang/include/clang/Lex/CodeCompletionHandler.h
+++ b/clang/include/clang/Lex/CodeCompletionHandler.h
@@ -13,12 +13,15 @@
 #ifndef LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 #define LLVM_CLANG_LEX_CODECOMPLETIONHANDLER_H
 
+#include "clang/Basic/IdentifierTable.h"
+#include "clang/Basic/SourceLocation.h"
 #include "llvm/ADT/StringRef.h"
 
 namespace clang {
 
 class IdentifierInfo;
 class MacroInfo;
+using ModuleIdPath = ArrayRef<IdentifierLoc>;
 
 /// Callback handler that receives notifications when performing code
 /// completion within the preprocessor.
@@ -70,6 +73,11 @@ class CodeCompletionHandler {
   /// file where we expect natural language, e.g., a comment, string, or
   /// \#error directive.
   virtual void CodeCompleteNaturalLanguage() { }
+
+  /// Callback invoked when performing code completion inside the module name
+  /// part of an import directive.
+  virtual void CodeCompleteModuleImport(SourceLocation ImportLoc,
+                                        ModuleIdPath Path) {}
 };
 
 }
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h
index bb65ae010cffa..a595cda1eaa77 100644
--- a/clang/include/clang/Lex/Lexer.h
+++ b/clang/include/clang/Lex/Lexer.h
@@ -124,7 +124,7 @@ class Lexer : public PreprocessorLexer {
   //===--------------------------------------------------------------------===//
   // Context that changes as the file is lexed.
   // NOTE: any state that mutates when in raw mode must have save/restore code
-  // in Lexer::isNextPPTokenLParen.
+  // in Lexer::peekNextPPToken.
 
   // BufferPtr - Current pointer into the buffer.  This is the next character
   // to be lexed.
@@ -642,10 +642,10 @@ class Lexer : public PreprocessorLexer {
     BufferPtr = TokEnd;
   }
 
-  /// isNextPPTokenLParen - Return 1 if the next unexpanded token will return a
-  /// tok::l_paren token, 0 if it is something else and 2 if there are no more
-  /// tokens in the buffer controlled by this lexer.
-  unsigned isNextPPTokenLParen();
+  /// peekNextPPToken - Return std::nullopt if there are no more tokens in the
+  /// buffer controlled by this lexer, otherwise return the next unexpanded
+  /// token.
+  std::optional<Token> peekNextPPToken();
 
   //===--------------------------------------------------------------------===//
   // Lexer character reading interfaces.
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index f2dfd3a349b8b..79a75a116c418 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -48,6 +48,7 @@
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/Registry.h"
+#include "llvm/Support/TrailingObjects.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
@@ -82,6 +83,7 @@ class PreprocessorLexer;
 class PreprocessorOptions;
 class ScratchBuffer;
 class TargetInfo;
+class ModuleNameLoc;
 
 namespace Builtin {
 class Context;
@@ -332,8 +334,9 @@ class Preprocessor {
   /// lexed, if any.
   SourceLocation ModuleImportLoc;
 
-  /// The import path for named module that we're currently processing.
-  SmallVector<IdentifierLoc, 2> NamedModuleImportPath;
+  /// The source location of the \c module contextual keyword we just
+  /// lexed, if any.
+  SourceLocation ModuleDeclLoc;
 
   llvm::DenseMap<FileID, SmallVector<const char *>> CheckPoints;
   unsigned CheckPointCounter = 0;
@@ -344,6 +347,21 @@ class Preprocessor {
   /// Whether the last token we lexed was an '@'.
   bool LastTokenWasAt = false;
 
+  /// Whether we're importing a standard C++20 named Modules.
+  bool ImportingCXXNamedModules = false;
+
+  /// Whether we're declaring a standard C++20 named Modules.
+  bool DeclaringCXXNamedModules = false;
+
+  struct ExportContextualKeywordInfo {
+    Token ExportTok;
+    bool TokAtPhysicalStartOfLine;
+  };
+
+  /// Whether the last token we lexed was an 'export' keyword.
+  std::optional<ExportContextualKeywordInfo> LastTokenWasExportKeyword =
+      std::nullopt;
+
   /// A position within a C++20 import-seq.
   class StdCXXImportSeq {
   public:
@@ -547,12 +565,7 @@ class Preprocessor {
         reset();
     }
 
-    void handleIdentifier(IdentifierInfo *Identifier) {
-      if (isModuleCandidate() && Identifier)
-        Name += Identifier->getName().str();
-      else if (!isNamedModule())
-        reset();
-    }
+    void handleModuleName(ModuleNameLoc *Path);
 
     void handleColon() {
       if (isModuleCandidate())
@@ -561,13 +574,6 @@ class Preprocessor {
         reset();
     }
 
-    void handlePeriod() {
-      if (isModuleCandidate())
-        Name += ".";
-      else if (!isNamedModule())
-        reset();
-    }
-
     void handleSemi() {
       if (!Name.empty() && isModuleCandidate()) {
         if (State == InterfaceCandidate)
@@ -622,10 +628,6 @@ class Preprocessor {
 
   ModuleDeclSeq ModuleDeclState;
 
-  /// Whether the module import expects an identifier next. Otherwise,
-  /// it expects a '.' or ';'.
-  bool ModuleImportExpectsIdentifier = false;
-
   /// The identifier and source location of the currently-active
   /// \#pragma clang arc_cf_code_audited begin.
   IdentifierLoc PragmaARCCFCodeAuditedInfo;
@@ -1759,6 +1761,19 @@ class Preprocessor {
   /// Lex the parameters for an #embed directive, returns nullopt on error.
   std::optional<LexEmbedParametersResult> LexEmbedParameters(Token &Current,
                                                              bool ForHasEmbed);
+  bool LexModuleNameContinue(Token &Tok, SourceLocation UseLoc,
+                             SmallVectorImpl<IdentifierLoc> &Path,
+                             bool AllowMacroExpansion = true);
+  void HandleCXXImportDirective(Token Import);
+  void HandleCXXModuleDirective(Token Module);
+  
+  /// Callback invoked when the lexer sees one of export, import or module token
+  /// at the start of a line.
+  ///
+  /// This consumes the import, module directive, modifies the
+  /// lexer/preprocessor state, and advances the lexer(s) so that the next token
+  /// read is the correct one.
+  bool HandleModuleContextualKeyword(Token &Result, bool TokAtPhysicalStartOfLine);
 
   bool LexAfterModuleImport(Token &Result);
   void CollectPpImportSuffix(SmallVectorImpl<Token> &Toks);
@@ -2282,7 +2297,9 @@ class Preprocessor {
   /// Determine whether the next preprocessor token to be
   /// lexed is a '('.  If so, consume the token and return true, if not, this
   /// method should have no observable side-effect on the lexed tokens.
-  bool isNextPPTokenLParen();
+  bool isNextPPTokenLParen() {
+    return peekNextPPToken().value_or(Token{}).is(tok::l_paren);
+  }
 
 private:
   /// Identifiers used for SEH handling in Borland. These are only
@@ -2342,7 +2359,7 @@ class Preprocessor {
   ///
   /// \return The location of the end of the directive (the terminating
   /// newline).
-  SourceLocation CheckEndOfDirective(const char *DirType,
+  SourceLocation CheckEndOfDirective(StringRef DirType,
                                      bool EnableMacros = false);
 
   /// Read and discard all tokens remaining on the current line until
@@ -2424,11 +2441,12 @@ class Preprocessor {
   }
 
   /// If we're importing a standard C++20 Named Modules.
-  bool isInImportingCXXNamedModules() const {
-    // NamedModuleImportPath will be non-empty only if we're importing
-    // Standard C++ named modules.
-    return !NamedModuleImportPath.empty() && getLangOpts().CPlusPlusModules &&
-           !IsAtImport;
+  bool isImportingCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && ImportingCXXNamedModules;
+  }
+
+  bool isDeclaringCXXNamedModules() const {
+    return getLangOpts().CPlusPlusModules && DeclaringCXXNamedModules;
   }
 
   /// Allocate a new MacroInfo object with the provided SourceLocation.
@@ -2661,6 +2679,10 @@ class Preprocessor {
 
   void removeCachedMacroExpandedTokensOfLastLexer();
 
+  /// Peek the next token. If so, return the token, if not, this
+  /// method should have no observable side-effect on the lexed tokens.
+  std::optional<Token> peekNextPPToken();
+
   /// After reading "MACRO(", this method is invoked to read all of the formal
   /// arguments specified for the macro invocation.  Returns null on error.
   MacroArgs *ReadMacroCallArgumentList(Token &MacroName, MacroInfo *MI,
@@ -3078,6 +3100,53 @@ struct EmbedAnnotationData {
   StringRef FileName;
 };
 
+/// Represents module name annotation data.
+///
+///     module-name:
+///           module-name-qualifier[opt] identifier
+///
+///     partition-name: [C++20]
+///           : module-name-qualifier[opt] identifier
+///
+///     module-name-qualifier
+///           module-name-qualifier[opt] identifier .
+class ModuleNameLoc final
+    : llvm::TrailingObjects<ModuleNameLoc, IdentifierLoc> {
+  friend TrailingObjects;
+  unsigned NumIdentifierLocs;
+
+  unsigned numTrailingObjects(OverloadToken<IdentifierLoc>) const {
+    return getNumIdentifierLocs();
+  }
+
+  ModuleNameLoc(ModuleIdPath Path) : NumIdentifierLocs(Path.size()) {
+    (void)llvm::copy(Path, getTrailingObjects<IdentifierLoc>());
+  }
+
+public:
+  static std::string stringFromModuleIdPath(ModuleIdPath Path);
+  static ModuleNameLoc *Create(Preprocessor &PP, ModuleIdPath Path);
+  static Token CreateAnnotToken(Preprocessor &PP, ModuleIdPath Path);
+  unsigned getNumIdentifierLocs() const { return NumIdentifierLocs; }
+  ModuleIdPath getModuleIdPath() const {
+    return {getTrailingObjects<IdentifierLoc>(), getNumIdentifierLocs()};
+  }
+
+  SourceLocation getBeginLoc() const {
+    return getModuleIdPath().front().getLoc();
+  }
+  SourceLocation getEndLoc() const {
+    auto &Last = getModuleIdPath().back();
+    return Last.getLoc().getLocWithOffset(
+        Last.getIdentifierInfo()->getLength());
+  }
+  SourceRange getRange() const { return {getBeginLoc(), getEndLoc()}; }
+
+  std::string str() const;
+  void print(llvm::raw_ostream &OS) const;
+  void dump() const { print(llvm::errs()); }
+};
+
 /// Registry of pragma handlers added by plugins
 using PragmaHandlerRegistry = llvm::Registry<PragmaHandler>;
 
diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h
index 4f29fb7d11415..8e81207ddf8d7 100644
--- a/clang/include/clang/Lex/Token.h
+++ b/clang/include/clang/Lex/Token.h
@@ -231,6 +231,9 @@ class Token {
     PtrData = const_cast<char*>(Ptr);
   }
 
+  template <class T> T getAnnotationValueAs() const {
+    return static_cast<T>(getAnnotationValue());
+  }
   void *getAnnotationValue() const {
     assert(isAnnotation() && "Used AnnotVal on non-annotation token");
     return PtrData;
@@ -289,6 +292,10 @@ class Token {
   /// Return the ObjC keyword kind.
   tok::ObjCKeywordKind getObjCKeywordID() const;
 
+  /// Return true if we have an C++20 Modules contextual keyword(export, import
+  /// or module).
+  bool isModuleContextualKeyword(bool AllowExport = true) const;
+
   bool isSimpleTypeSpecifier(const LangOptions &LangOpts) const;
 
   /// Return true if this token has trigraphs or escaped newlines in it.
diff --git a/clang/include/clang/Lex/TokenLexer.h b/clang/include/clang/Lex/TokenLexer.h
index 4d229ae610674..777b4e6266c71 100644
--- a/clang/include/clang/Lex/TokenLexer.h
+++ b/clang/include/clang/Lex/TokenLexer.h
@@ -139,10 +139,9 @@ class TokenLexer {
   void Init(const Token *TokArray, unsigned NumToks, bool DisableMacroExpansion,
             bool OwnsTokens, bool IsReinject);
 
-  /// If the next token lexed will pop this macro off the
-  /// expansion stack, return 2.  If the next unexpanded token is a '(', return
-  /// 1, otherwise return 0.
-  unsigned isNextTokenLParen() const;
+  /// If the next token lexed will pop this macro off the expansion stack,
+  /// return std::nullopt, otherwise return the next unexpanded token.
+  std::optional<Token> peekNextPPToken() const;
 
   /// Lex and return a token from this macro stream.
   bool Lex(Token &Tok);
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index c4bef4729fd36..a59a99bbac7c6 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -1079,6 +1079,8 @@ class Parser : public CodeCompletionHandler {
                                  unsigned ArgumentIndex) override;
   ...
[truncated]

@yronglin yronglin marked this pull request as draft May 31, 2025 08:02
Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin yronglin force-pushed the modules_dependency_discovery branch from d300a2b to 04ddbf6 Compare June 2, 2025 12:33
@yronglin yronglin marked this pull request as ready for review June 2, 2025 12:33
@llvmbot llvmbot added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Jun 2, 2025
Signed-off-by: yronglin <yronglin777@gmail.com>
@yronglin
Copy link
Contributor Author

yronglin commented Jun 2, 2025

@cor3ntin @ChuanqiXu9 Do you have any comments on the current implementation approach? Feel free to leave comments. Since this PR is too large, I'd like to split it into several smaller PRs if the implementation approach is correct. WDYT?

@ChuanqiXu9
Copy link
Member

Generally it should be good to split large patches into smaller ones. For the topic if the approach is correct, I can't talk too much since @cor3ntin spent most times.

@cor3ntin
Copy link
Contributor

cor3ntin commented Jun 3, 2025

@Bigcheese can you do a pass on that? Thanks a lot

@yronglin
Copy link
Contributor Author

yronglin commented Jun 3, 2025

@Bigcheese I have a question about A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.). IIUC, as you said in #90574 (comment), this rule intended to prohibit the following code?

// error: User need to have a `module;` decl before any preprocessor directives.
#include "foo.h"
export module M;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants