-
Couldn't load subscription status.
- Fork 15k
[CGData] llvm-cgdata #89884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CGData] llvm-cgdata #89884
Conversation
031f7f0 to
d22358a
Compare
| cl::values(clEnumValN(CD_Text, "text", "Text encoding"), | ||
| clEnumValN(CD_Binary, "binary", "Binary encoding"))); | ||
|
|
||
| cl::opt<bool> ShowCGDataVersion("cgdata-version", cl::init(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything wrong with simply using llvm-gcdata show --version? Also, false should already be the default.
| cl::opt<bool> ShowCGDataVersion("cgdata-version", cl::init(false), | |
| cl::opt<bool> ShowCGDataVersion("version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything wrong with simply using
llvm-gcdata show --version? Also,falseshould already be the default.
I can't drop the prefix as it conflicts with the existing version flag to LLVM.
In fact, this code is similar to -profile-version used for llvm-profdata. So I left the code except deleting the initial value (false).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised there is a flag conflict if you are using a subcommand. But it makes sense to keep this consistent with -profile-version.
b8942ba to
1dbd111
Compare
| Version1 = 1, | ||
| CurrentVersion = CG_DATA_INDEX_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to also have a SubVersion - which is compatible within the same Version but data might be logically different (i.e. minor fixes, etc) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't anticipate many changes of this format in the future, and aim for simplicity. The code is largely modeled after the IRPGO profile -- https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L1071
| return Magic == IndexedCGData::Magic; | ||
| } | ||
|
|
||
| bool TextCodeGenDataReader::hasFormat(const MemoryBuffer &Buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this function is kind of confusing - or contradictory with description. Should it be something like hasAsciiFormat - maybe I'm misinterpreting the usage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description in the header is following, which is also modeled after the case of IRPGO -- https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProfReader.h#L252-L253
I'd like to keep them synced, and also I think it's clearer without it as this is a static function and the class name TextCodeGenDataReader already implies we're interested in a text format here.
/// Return true if the given buffer is in text codegen data format.
static bool hasFormat(const MemoryBuffer &Buffer);
| // remaining fields to allow back-patching later. | ||
| COS.write(Header.Magic); | ||
| COS.write32(Header.Version); | ||
| COS.write32(Header.DataKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add the ability to write some client-defined string metadata for the file ? Ex: module name, source hash, build system info, etc ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like the IRPGO case -- https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ProfileData/InstrProf.h#L1117-L1124, I aim to maintain a simplified header while still capturing any discrepancies. Details such as the module name may change based on client usage and can be distinguished by using a different DataKind. I expect that these details can be encoded within their respective data blobs as needed.
|
@llvm/pr-subscribers-pgo Author: Kyungwoo Lee (kyulee-com) ChangesThe llvm-cgdata tool has been introduced to handle reading and writing of codegen data. This data includes an optimistic codegen summary that can be utilized to enhance subsequent codegen. Currently, the tool supports saving and restoring the outlined hash tree, facilitating machine function outlining across modules. Additional codegen summaries can be incorporated into separate sections as required. This patch primarily establishes basic support for the reader and writer, similar to llvm-profdata. The high-level operations of llvm-cgdata are as follows:
This depends on #89792. Patch is 63.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89884.diff 21 Files Affected:
diff --git a/llvm/include/llvm/CodeGenData/CodeGenData.h b/llvm/include/llvm/CodeGenData/CodeGenData.h
new file mode 100644
index 0000000000000..f46dc0c28cbc7
--- /dev/null
+++ b/llvm/include/llvm/CodeGenData/CodeGenData.h
@@ -0,0 +1,202 @@
+//===- CodeGenData.h --------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains support for codegen data that has stable summary which
+// can be used to optimize the code in the subsequent codegen.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CODEGENDATA_CODEGENDATA_H
+#define LLVM_CODEGENDATA_CODEGENDATA_H
+
+#include "llvm/ADT/BitmaskEnum.h"
+#include "llvm/Bitcode/BitcodeReader.h"
+#include "llvm/CodeGenData/OutlinedHashTree.h"
+#include "llvm/CodeGenData/OutlinedHashTreeRecord.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/TargetParser/Triple.h"
+#include <mutex>
+
+namespace llvm {
+
+enum CGDataSectKind {
+#define CG_DATA_SECT_ENTRY(Kind, SectNameCommon, SectNameCoff, Prefix) Kind,
+#include "llvm/CodeGenData/CodeGenData.inc"
+};
+
+std::string getCodeGenDataSectionName(CGDataSectKind CGSK,
+ Triple::ObjectFormatType OF,
+ bool AddSegmentInfo = true);
+
+enum class CGDataKind {
+ Unknown = 0x0,
+ // A function outlining info.
+ FunctionOutlinedHashTree = 0x1,
+ LLVM_MARK_AS_BITMASK_ENUM(/*LargestValue=*/FunctionOutlinedHashTree)
+};
+
+const std::error_category &cgdata_category();
+
+enum class cgdata_error {
+ success = 0,
+ eof,
+ bad_magic,
+ bad_header,
+ empty_cgdata,
+ malformed,
+ unsupported_version,
+};
+
+inline std::error_code make_error_code(cgdata_error E) {
+ return std::error_code(static_cast<int>(E), cgdata_category());
+}
+
+class CGDataError : public ErrorInfo<CGDataError> {
+public:
+ CGDataError(cgdata_error Err, const Twine &ErrStr = Twine())
+ : Err(Err), Msg(ErrStr.str()) {
+ assert(Err != cgdata_error::success && "Not an error");
+ }
+
+ std::string message() const override;
+
+ void log(raw_ostream &OS) const override { OS << message(); }
+
+ std::error_code convertToErrorCode() const override {
+ return make_error_code(Err);
+ }
+
+ cgdata_error get() const { return Err; }
+ const std::string &getMessage() const { return Msg; }
+
+ /// Consume an Error and return the raw enum value contained within it, and
+ /// the optional error message. The Error must either be a success value, or
+ /// contain a single CGDataError.
+ static std::pair<cgdata_error, std::string> take(Error E) {
+ auto Err = cgdata_error::success;
+ std::string Msg = "";
+ handleAllErrors(std::move(E), [&Err, &Msg](const CGDataError &IPE) {
+ assert(Err == cgdata_error::success && "Multiple errors encountered");
+ Err = IPE.get();
+ Msg = IPE.getMessage();
+ });
+ return {Err, Msg};
+ }
+
+ static char ID;
+
+private:
+ cgdata_error Err;
+ std::string Msg;
+};
+
+enum CGDataMode {
+ None,
+ Read,
+ Write,
+};
+
+class CodeGenData {
+ /// Global outlined hash tree that has oulined hash sequences across modules.
+ std::unique_ptr<OutlinedHashTree> PublishedHashTree;
+
+ /// This flag is set when -fcodegen-data-generate is passed.
+ /// Or, it can be mutated with -fcodegen-data-thinlto-two-rounds.
+ bool EmitCGData;
+
+ /// This is a singleton instance which is thread-safe. Unlike profile data
+ /// which is largely function-based, codegen data describes the whole module.
+ /// Therefore, this can be initialized once, and can be used across modules
+ /// instead of constructing the same one for each codegen backend.
+ static std::unique_ptr<CodeGenData> Instance;
+ static std::once_flag OnceFlag;
+
+ CodeGenData() = default;
+
+public:
+ ~CodeGenData() = default;
+
+ static CodeGenData &getInstance();
+
+ /// Returns true if we have a valid outlined hash tree.
+ bool hasOutlinedHashTree() {
+ return PublishedHashTree && !PublishedHashTree->empty();
+ }
+
+ /// Returns the outlined hash tree. This can be globally used in a read-only
+ /// manner.
+ const OutlinedHashTree *getOutlinedHashTree() {
+ return PublishedHashTree.get();
+ }
+
+ /// Returns true if we should write codegen data.
+ bool emitCGData() { return EmitCGData; }
+
+ /// Publish the (globally) merged or read outlined hash tree.
+ void publishOutlinedHashTree(std::unique_ptr<OutlinedHashTree> HashTree) {
+ PublishedHashTree = std::move(HashTree);
+ // Ensure we disable emitCGData as we do not want to read and write both.
+ EmitCGData = false;
+ }
+};
+
+namespace cgdata {
+
+inline bool hasOutlinedHashTree() {
+ return CodeGenData::getInstance().hasOutlinedHashTree();
+}
+
+inline const OutlinedHashTree *getOutlinedHashTree() {
+ return CodeGenData::getInstance().getOutlinedHashTree();
+}
+
+inline bool emitCGData() { return CodeGenData::getInstance().emitCGData(); }
+
+inline void
+publishOutlinedHashTree(std::unique_ptr<OutlinedHashTree> HashTree) {
+ CodeGenData::getInstance().publishOutlinedHashTree(std::move(HashTree));
+}
+
+void warn(Error E, StringRef Whence = "");
+void warn(Twine Message, std::string Whence = "", std::string Hint = "");
+
+} // end namespace cgdata
+
+namespace IndexedCGData {
+
+const uint64_t Magic = 0x81617461646763ff; // "\xffcgdata\x81"
+
+enum CGDataVersion {
+ // Version 1 is the first version. This version supports the outlined
+ // hash tree.
+ Version1 = 1,
+ CurrentVersion = CG_DATA_INDEX_VERSION
+};
+const uint64_t Version = CGDataVersion::CurrentVersion;
+
+struct Header {
+ uint64_t Magic;
+ uint32_t Version;
+ uint32_t DataKind;
+ uint64_t OutlinedHashTreeOffset;
+
+ // New fields should only be added at the end to ensure that the size
+ // computation is correct. The methods below need to be updated to ensure that
+ // the new field is read correctly.
+
+ // Reads a header struct from the buffer.
+ static Expected<Header> readFromBuffer(const unsigned char *Curr);
+};
+
+} // end namespace IndexedCGData
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGEN_PREPARE_H
diff --git a/llvm/include/llvm/CodeGenData/CodeGenData.inc b/llvm/include/llvm/CodeGenData/CodeGenData.inc
new file mode 100644
index 0000000000000..5f6df5c0bf106
--- /dev/null
+++ b/llvm/include/llvm/CodeGenData/CodeGenData.inc
@@ -0,0 +1,46 @@
+/*===-- CodeGenData.inc ----------------------------------------*- C++ -*-=== *\
+|*
+|* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+|* See https://llvm.org/LICENSE.txt for license information.
+|* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+|*
+\*===----------------------------------------------------------------------===*/
+/*
+ * This is the main file that defines all the data structure, signature,
+ * constant literals that are shared across compiler, host tools (reader/writer)
+ * to support codegen data.
+ *
+\*===----------------------------------------------------------------------===*/
+
+#ifdef CG_DATA_SECT_ENTRY
+#define CG_DATA_DEFINED
+CG_DATA_SECT_ENTRY(CG_outline, CG_DATA_QUOTE(CG_DATA_OUTLINE_COMMON),
+ CG_DATA_OUTLINE_COFF, "__DATA,")
+
+#undef CG_DATA_SECT_ENTRY
+#endif
+
+/* section name strings common to all targets other
+ than WIN32 */
+#define CG_DATA_OUTLINE_COMMON __llvm_outline
+/* Since cg data sections are not allocated, we don't need to
+ * access them at runtime.
+ */
+#define CG_DATA_OUTLINE_COFF ".loutline"
+
+#ifdef _WIN32
+/* Runtime section names and name strings. */
+#define CG_DATA_SECT_NAME CG_DATA_OUTLINE_COFF
+
+#else
+/* Runtime section names and name strings. */
+#define CG_DATA_SECT_NAME INSTR_PROF_QUOTE(CG_DATA_OUTLINE_COMMON)
+
+#endif
+
+/* Indexed codegen data format version (start from 1). */
+#define CG_DATA_INDEX_VERSION 1
+
+/* Helper macros. */
+#define CG_DATA_SIMPLE_QUOTE(x) #x
+#define CG_DATA_QUOTE(x) CG_DATA_SIMPLE_QUOTE(x)
diff --git a/llvm/include/llvm/CodeGenData/CodeGenDataReader.h b/llvm/include/llvm/CodeGenData/CodeGenDataReader.h
new file mode 100644
index 0000000000000..df4ae3ed24e79
--- /dev/null
+++ b/llvm/include/llvm/CodeGenData/CodeGenDataReader.h
@@ -0,0 +1,154 @@
+//===- CodeGenDataReader.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains support for reading codegen data.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CODEGENDATA_CODEGENDATAREADER_H
+#define LLVM_CODEGENDATA_CODEGENDATAREADER_H
+
+#include "llvm/CodeGenData/CodeGenData.h"
+#include "llvm/CodeGenData/OutlinedHashTreeRecord.h"
+#include "llvm/Support/LineIterator.h"
+#include "llvm/Support/VirtualFileSystem.h"
+
+namespace llvm {
+
+class CodeGenDataReader {
+ cgdata_error LastError = cgdata_error::success;
+ std::string LastErrorMsg;
+
+public:
+ CodeGenDataReader() = default;
+ virtual ~CodeGenDataReader() = default;
+
+ /// Read the header. Required before reading first record.
+ virtual Error read() = 0;
+ /// Return the codegen data version.
+ virtual uint32_t getVersion() const = 0;
+ /// Return the codegen data kind.
+ virtual CGDataKind getDataKind() const = 0;
+ /// Return true if the data has an outlined hash tree.
+ virtual bool hasOutlinedHashTree() const = 0;
+ /// Return the outlined hash tree that is released from the reader.
+ std::unique_ptr<OutlinedHashTree> releaseOutlinedHashTree() {
+ return std::move(HashTreeRecord.HashTree);
+ }
+
+ /// Factory method to create an appropriately typed reader for the given
+ /// codegen data file path and file system.
+ static Expected<std::unique_ptr<CodeGenDataReader>>
+ create(const Twine &Path, vfs::FileSystem &FS);
+
+ /// Factory method to create an appropriately typed reader for the given
+ /// memory buffer.
+ static Expected<std::unique_ptr<CodeGenDataReader>>
+ create(std::unique_ptr<MemoryBuffer> Buffer);
+
+ /// Extract the cgdata embedded in sections from the given object file and
+ /// merge them into the GlobalOutlineRecord. This is a static helper that
+ /// is used by `llvm-cgdata merge` or ThinLTO's two-codegen rounds.
+ static Error mergeFromObjectFile(const object::ObjectFile *Obj,
+ OutlinedHashTreeRecord &GlobalOutlineRecord);
+
+protected:
+ /// The outlined hash tree that has been read. When it's released by
+ /// releaseOutlinedHashTree(), it's no longer valid.
+ OutlinedHashTreeRecord HashTreeRecord;
+
+ /// Set the current error and return same.
+ Error error(cgdata_error Err, const std::string &ErrMsg = "") {
+ LastError = Err;
+ LastErrorMsg = ErrMsg;
+ if (Err == cgdata_error::success)
+ return Error::success();
+ return make_error<CGDataError>(Err, ErrMsg);
+ }
+
+ Error error(Error &&E) {
+ handleAllErrors(std::move(E), [&](const CGDataError &IPE) {
+ LastError = IPE.get();
+ LastErrorMsg = IPE.getMessage();
+ });
+ return make_error<CGDataError>(LastError, LastErrorMsg);
+ }
+
+ /// Clear the current error and return a successful one.
+ Error success() { return error(cgdata_error::success); }
+};
+
+class IndexedCodeGenDataReader : public CodeGenDataReader {
+ /// The codegen data file contents.
+ std::unique_ptr<MemoryBuffer> DataBuffer;
+ /// The header
+ IndexedCGData::Header Header;
+
+public:
+ IndexedCodeGenDataReader(std::unique_ptr<MemoryBuffer> DataBuffer)
+ : DataBuffer(std::move(DataBuffer)) {}
+ IndexedCodeGenDataReader(const IndexedCodeGenDataReader &) = delete;
+ IndexedCodeGenDataReader &
+ operator=(const IndexedCodeGenDataReader &) = delete;
+
+ /// Return true if the given buffer is in binary codegen data format.
+ static bool hasFormat(const MemoryBuffer &Buffer);
+ /// Read the contents including the header.
+ Error read() override;
+ /// Return the codegen data version.
+ uint32_t getVersion() const override { return Header.Version; }
+ /// Return the codegen data kind.
+ CGDataKind getDataKind() const override {
+ return static_cast<CGDataKind>(Header.DataKind);
+ }
+ /// Return true if the header indicates the data has an outlined hash tree.
+ /// This does not mean that the data is still available.
+ bool hasOutlinedHashTree() const override {
+ return Header.DataKind &
+ static_cast<uint32_t>(CGDataKind::FunctionOutlinedHashTree);
+ }
+};
+
+/// This format is a simple text format that's suitable for test data.
+/// The header is a custom format starting with `:` per line to indicate which
+/// codegen data is recorded. `#` is used to indicate a comment.
+/// The subsequent data is a YAML format per each codegen data in order.
+/// Currently, it only has a function outlined hash tree.
+class TextCodeGenDataReader : public CodeGenDataReader {
+ /// The codegen data file contents.
+ std::unique_ptr<MemoryBuffer> DataBuffer;
+ /// Iterator over the profile data.
+ line_iterator Line;
+ /// Describe the kind of the codegen data.
+ CGDataKind DataKind = CGDataKind::Unknown;
+
+public:
+ TextCodeGenDataReader(std::unique_ptr<MemoryBuffer> DataBuffer_)
+ : DataBuffer(std::move(DataBuffer_)), Line(*DataBuffer, true, '#') {}
+ TextCodeGenDataReader(const TextCodeGenDataReader &) = delete;
+ TextCodeGenDataReader &operator=(const TextCodeGenDataReader &) = delete;
+
+ /// Return true if the given buffer is in text codegen data format.
+ static bool hasFormat(const MemoryBuffer &Buffer);
+ /// Read the contents including the header.
+ Error read() override;
+ /// Text format does not have version, so return 0.
+ uint32_t getVersion() const override { return 0; }
+ /// Return the codegen data kind.
+ CGDataKind getDataKind() const override { return DataKind; }
+ /// Return true if the header indicates the data has an outlined hash tree.
+ /// This does not mean that the data is still available.
+ bool hasOutlinedHashTree() const override {
+ return static_cast<uint32_t>(DataKind) &
+ static_cast<uint32_t>(CGDataKind::FunctionOutlinedHashTree);
+ }
+};
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGENDATA_CODEGENDATAREADER_H
diff --git a/llvm/include/llvm/CodeGenData/CodeGenDataWriter.h b/llvm/include/llvm/CodeGenData/CodeGenDataWriter.h
new file mode 100644
index 0000000000000..e17ffc3482ec9
--- /dev/null
+++ b/llvm/include/llvm/CodeGenData/CodeGenDataWriter.h
@@ -0,0 +1,68 @@
+//===- CodeGenDataWriter.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains support for writing codegen data.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CODEGENDATA_CODEGENDATAWRITER_H
+#define LLVM_CODEGENDATA_CODEGENDATAWRITER_H
+
+#include "llvm/CodeGenData/CodeGenData.h"
+#include "llvm/CodeGenData/OutlinedHashTreeRecord.h"
+#include "llvm/Support/Error.h"
+
+namespace llvm {
+
+class CGDataOStream;
+
+class CodeGenDataWriter {
+ /// The outlined hash tree to be written.
+ OutlinedHashTreeRecord HashTreeRecord;
+
+ /// A bit mask describing the kind of the codegen data.
+ CGDataKind DataKind = CGDataKind::Unknown;
+
+public:
+ CodeGenDataWriter() = default;
+ ~CodeGenDataWriter() = default;
+
+ /// Add the outlined hash tree record. The input Record is released.
+ void addRecord(OutlinedHashTreeRecord &Record);
+
+ /// Write the codegen data to \c OS
+ Error write(raw_fd_ostream &OS);
+
+ /// Write the codegen data in text format to \c OS
+ Error writeText(raw_fd_ostream &OS);
+
+ /// Return the attributes of the current CGData.
+ CGDataKind getCGDataKind() const { return DataKind; }
+
+ /// Return true if the header indicates the data has an outlined hash tree.
+ bool hasOutlinedHashTree() const {
+ return static_cast<uint32_t>(DataKind) &
+ static_cast<uint32_t>(CGDataKind::FunctionOutlinedHashTree);
+ }
+
+private:
+ /// The offset of the outlined hash tree in the file.
+ uint64_t OutlinedHashTreeOffset;
+
+ /// Write the codegen data header to \c COS
+ Error writeHeader(CGDataOStream &COS);
+
+ /// Write the codegen data header in text to \c OS
+ Error writeHeaderText(raw_fd_ostream &OS);
+
+ Error writeImpl(CGDataOStream &COS);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGENDATA_CODEGENDATAWRITER_H
diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h
index 9b34cb0b651f7..b41b4b9ca22d2 100644
--- a/llvm/include/llvm/ProfileData/InstrProf.h
+++ b/llvm/include/llvm/ProfileData/InstrProf.h
@@ -414,7 +414,7 @@ class InstrProfError : public ErrorInfo<InstrProfError> {
/// contain a single InstrProfError.
static std::pair<instrprof_error, std::string> take(Error E) {
auto Err = instrprof_error::success;
- std::string Msg = "";
+ std::string Msg;
handleAllErrors(std::move(E), [&Err, &Msg](const InstrProfError &IPE) {
assert(Err == instrprof_error::success && "Multiple errors encountered");
Err = IPE.get();
diff --git a/llvm/lib/CodeGenData/CMakeLists.txt b/llvm/lib/CodeGenData/CMakeLists.txt
index 3ba90f96cc86d..1156d53afb2e0 100644
--- a/llvm/lib/CodeGenData/CMakeLists.txt
+++ b/llvm/lib/CodeGenData/CMakeLists.txt
@@ -1,4 +1,7 @@
add_llvm_component_library(LLVMCodeGenData
+ CodeGenData.cpp
+ CodeGenDataReader.cpp
+ CodeGenDataWriter.cpp
OutlinedHashTree.cpp
OutlinedHashTreeRecord.cpp
diff --git a/llvm/lib/CodeGenData/CodeGenData.cpp b/llvm/lib/CodeGenData/CodeGenData.cpp
new file mode 100644
index 0000000000000..3bd21c97c7de7
--- /dev/null
+++ b/llvm/lib/CodeGenData/CodeGenData.cpp
@@ -0,0 +1,197 @@
+//===-- CodeGenData.cpp ---------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains support for codegen data that has stable summary which
+// can be used to optimize the code in the subsequent codegen.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Bitcode/BitcodeWriter.h"
+#include "llvm/CodeGenData/CodeGenDataReader.h"
+#include "llvm/CodeGenData/OutlinedHashTreeRecord.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/WithColor.h"
+
+#define DEBUG_TYPE "cg-data"
+
+using namespace llvm;
+using namespace cgdata;
+
+static std::string getCGDataErrString(cgdata_error Err,
+ const std::string &ErrMsg = "") {
+ std::string Msg;
+ raw_string_ostream OS(Msg);
+
+ switch (Err) {
+ case cgdata_error::success:
+ OS << "success";
+ break;
+ case cgdata_error::eof:
+ OS << "end of File";
+ break;
+ case cgdata_error::bad_magic:
+ OS << "invalid codegen data (bad magic)";
+ break;
+ case cgdata_error::bad_header:
+ OS << "invalid codegen data (file header is corrupt)";
+ break;
+ case cgdata_error::empty_cgdata:
+ OS << "empty codegen data";
+ break;
+ case cgdata_error::malformed:
+ OS << "malformed codegen data";
+ break;
+ case cgdata_error::unsupported_version:
+ OS << "unsupported codegen data version";
+ break;
+ }
+
+ // If optional error message is not empty, append it to the message.
+ if (!ErrMsg.empty())
+ OS << ": " << ErrMsg;
+
+ return OS.str();
+}
+
+namespace {
+
+// FIXME: This cl...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to add documentation for the tool and file format?
llvm/lib/CodeGenData/CodeGenData.cpp
Outdated
| auto *CGD = new CodeGenData(); | ||
| Instance.reset(CGD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work as expected?
| auto *CGD = new CodeGenData(); | |
| Instance.reset(CGD); | |
| Instance = std::make_unique(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why not check if Instance is null before creating it instead of using std::call_once? Or does this need to work with multiple threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeGenData constructor is private purposely, and I can't use make_unique<CodeGenData> as its template expansion requires it public. Instead, I just combine these two lines by directly allocating and assigning toghether.
As for using call_once, indeed this supports for working with multiple threads as (in-process) thinlto backends operate in parallel.
| auto BufferOrErr = Filename.str() == "-" ? MemoryBuffer::getSTDIN() | ||
| : FS.getBufferForFile(Filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto BufferOrErr = Filename.str() == "-" ? MemoryBuffer::getSTDIN() | |
| : FS.getBufferForFile(Filename); | |
| auto BufferOrErr = MemoryBuffer::getFileOrSTDIN(Filename) |
Or does this not work because you need to use FS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the IRPGO case https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/InstrProfReader.cpp#L73-L74, I want to keep FS in case a custom file system is needed.
|
|
||
| RUN: split-file %s %t | ||
|
|
||
| # Synthesize two set of raw cgdata without the header (24 byte) from the indexed cgdata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Synthesize two set of raw cgdata without the header (24 byte) from the indexed cgdata. | |
| # Synthesize two sets of raw cgdata without the header (24 byte) from the indexed cgdata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
|
||
| namespace IndexedCGData { | ||
|
|
||
| const uint64_t Magic = 0x81617461646763ff; // "\xffcgdata\x81" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a comment explaining the use of Magic, something like An identifier for XYZ.... Also what is the meaning of the hexadecimal in comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful to distinguish the file from an ascii text file by just looking at the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added/expanded the comment.
As for the use of Magic, I keep its use only for the binary header, aligning with the IRPGO case. So, the ascii text file won't have this magic in the header.
The llvm-cgdata tool has been introduced to handle reading and writing of codegen data. This data includes an optimistic codegen summary that can be utilized to enhance subsequent codegen. Currently, the tool supports saving and restoring the outlined hash tree, facilitating machine function outlining across modules. Additional codegen summaries can be incorporated into separate sections as required. This patch primarily establishes basic support for the reader and writer, similar to llvm-profdata. The high-level operations of llvm-cgdata are as follows: 1. It reads local raw codegen data from a custom section (for example, __llvm_outline) embedded in native binary files 2. It merges local raw codegen data into an indexed codegen data, complete with a suitable header. 3. It handles reading and writing of the indexed codegen data into a standalone file.
The llvm-cgdata tool has been introduced to handle reading and writing of codegen data. This data includes an optimistic codegen summary that can be utilized to enhance subsequent codegen. Currently, the tool supports saving and restoring the outlined hash tree, facilitating machine function outlining across modules. Additional codegen summaries can be incorporated into separate sections as required. This patch primarily establishes basic support for the reader and writer, similar to llvm-profdata.
The high-level operations of llvm-cgdata are as follows:
This depends on #89792.
This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.