Skip to content

[CIR] Add initial support for bitfields in structs #142041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Andres-Salamanca
Copy link
Contributor

This change adds support for bitfields CIR records can now contain bit fields.

I’ve updated the CIRGenBitFieldInfo comment, which originally came from the incubator and was identical to the one in OGCodeGen, to better reflect the current implementation.

Support for bitfields in unions big-endian architectures and get and set operations remains unimplemented and will be addressed in a future patch.

@llvmbot llvmbot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels May 29, 2025
@llvmbot
Copy link
Member

llvmbot commented May 29, 2025

@llvm/pr-subscribers-clangir

Author: None (Andres-Salamanca)

Changes

This change adds support for bitfields CIR records can now contain bit fields.

I’ve updated the CIRGenBitFieldInfo comment, which originally came from the incubator and was identical to the one in OGCodeGen, to better reflect the current implementation.

Support for bitfields in unions big-endian architectures and get and set operations remains unimplemented and will be addressed in a future patch.


Full diff: https://github.com/llvm/llvm-project/pull/142041.diff

4 Files Affected:

  • (modified) clang/include/clang/CIR/MissingFeatures.h (+2)
  • (modified) clang/lib/CIR/CodeGen/CIRGenRecordLayout.h (+114)
  • (modified) clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp (+150-6)
  • (added) clang/test/CIR/CodeGen/bitfields.cpp (+79)
diff --git a/clang/include/clang/CIR/MissingFeatures.h b/clang/include/clang/CIR/MissingFeatures.h
index fb205e9cd85d1..6d7e3998d2da8 100644
--- a/clang/include/clang/CIR/MissingFeatures.h
+++ b/clang/include/clang/CIR/MissingFeatures.h
@@ -134,6 +134,8 @@ struct MissingFeatures {
   static bool cxxSupport() { return false; }
   static bool recordZeroInit() { return false; }
   static bool zeroSizeRecordMembers() { return false; }
+  static bool isDiscreteBitFieldABI() { return false; }
+  static bool isBigEndian() { return false; }
 
   // Misc
   static bool cxxABI() { return false; }
diff --git a/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h b/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h
index 2ece85b8aa0a3..023f6bbaa47bb 100644
--- a/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h
+++ b/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h
@@ -14,6 +14,106 @@
 
 namespace clang::CIRGen {
 
+/// Record with information about how a bitfield should be accessed. This is
+/// very similar to what LLVM codegen does, once CIR evolves it's possible we
+/// can use a more higher level representation.
+///
+/// Often we lay out a sequence of bitfields as a contiguous sequence of bits.
+/// When the AST record layout does this, we represent it in CIR as a
+/// `!cir.record` type, which directly reflects the structure's layout,
+/// including bitfield packing and padding, using CIR types such as
+/// `!cir.bool`, `!s8i`, `!u16i`.
+///
+/// To access a particular bitfield in CIR, we use the operations
+/// `cir.get_bitfield` (`GetBitfieldOp`) or `cir.set_bitfield`
+/// (`SetBitfieldOp`). These operations rely on the `bitfield_info`
+/// attribute, which provides detailed metadata required for access,
+/// such as the size and offset of the bitfield, the type and size of
+/// the underlying storage, and whether the value is signed.
+/// The CIRGenRecordLayout also has a bitFields map which encodes which
+/// byte-sequence this bitfield falls within. Let's assume the following C
+/// struct:
+///
+/// struct:
+///
+///   struct S {
+///     char a, b, c;
+///     unsigned bits : 3;
+///     unsigned more_bits : 4;
+///     unsigned still_more_bits : 7;
+///   };
+///
+/// This will end up as the following cir.record. The first array is the
+/// bitfield, and the second is the padding out to a 4-byte alignment.
+///
+///   !rec_S = !cir.record<struct "S" padded {!s8i, !s8i, !s8i, !u8i, !u8i,
+///   !cir.array<!u8i x 3>}>
+///
+/// When generating code to access more_bits, we'll generate something
+/// essentially like this:
+///
+///   #bfi_more_bits = #cir.bitfield_info<name = "more_bits", storage_type =
+///   !u16i, size = 4, offset = 3, is_signed = false>
+///
+///   cir.func @store_field() {
+///     %0 = cir.alloca !rec_S, !cir.ptr<!rec_S>, ["s"] {alignment = 4 : i64}
+///     %1 = cir.const #cir.int<2> : !s32i
+///     %2 = cir.cast(integral, %1 : !s32i), !u32i
+///     %3 = cir.get_member %0[4] {name = "more_bits"} : !cir.ptr<!rec_S> ->
+///     !cir.ptr<!u16i>
+///     %4 = cir.set_bitfield(#bfi_more_bits, %3 :
+///     !cir.ptr<!u16i>, %2 : !u32i) -> !u32i cir.return
+///   }
+////
+struct CIRGenBitFieldInfo {
+  /// The offset within a contiguous run of bitfields that are represented as
+  /// a single "field" within the cir.record type. This offset is in bits.
+  unsigned offset : 16;
+
+  /// The total size of the bit-field, in bits.
+  unsigned size : 15;
+
+  /// Whether the bit-field is signed.
+  unsigned isSigned : 1;
+
+  /// The storage size in bits which should be used when accessing this
+  /// bitfield.
+  unsigned storageSize;
+
+  /// The offset of the bitfield storage from the start of the record.
+  clang::CharUnits storageOffset;
+
+  /// The offset within a contiguous run of bitfields that are represented as a
+  /// single "field" within the cir.record type, taking into account the AAPCS
+  /// rules for volatile bitfields. This offset is in bits.
+  unsigned volatileOffset : 16;
+
+  /// The storage size in bits which should be used when accessing this
+  /// bitfield.
+  unsigned volatileStorageSize;
+
+  /// The offset of the bitfield storage from the start of the record.
+  clang::CharUnits volatileStorageOffset;
+
+  /// The name of a bitfield
+  llvm::StringRef name;
+
+  // The actual storage type for the bitfield
+  mlir::Type storageType;
+
+  CIRGenBitFieldInfo()
+      : offset(), size(), isSigned(), storageSize(), volatileOffset(),
+        volatileStorageSize() {}
+
+  CIRGenBitFieldInfo(unsigned offset, unsigned size, bool isSigned,
+                     unsigned storageSize, clang::CharUnits storageOffset)
+      : offset(offset), size(size), isSigned(isSigned),
+        storageSize(storageSize), storageOffset(storageOffset) {}
+
+  void print(llvm::raw_ostream &os) const;
+  void dump() const;
+};
+
 /// This class handles record and union layout info while lowering AST types
 /// to CIR types.
 ///
@@ -33,6 +133,10 @@ class CIRGenRecordLayout {
   /// field no. This info is populated by the record builder.
   llvm::DenseMap<const clang::FieldDecl *, unsigned> fieldIdxMap;
 
+  /// Map from (bit-field) record field to the corresponding CIR record type
+  /// field no. This info is populated by record builder.
+  llvm::DenseMap<const clang::FieldDecl *, CIRGenBitFieldInfo> bitFields;
+
   /// False if any direct or indirect subobject of this class, when considered
   /// as a complete object, requires a non-zero bitpattern when
   /// zero-initialized.
@@ -69,6 +173,16 @@ class CIRGenRecordLayout {
   /// Check whether this struct can be C++ zero-initialized
   /// with a zeroinitializer when considered as a base subobject.
   bool isZeroInitializableAsBase() const { return zeroInitializableAsBase; }
+
+  /// Return the BitFieldInfo that corresponds to the field FD.
+  const CIRGenBitFieldInfo &getBitFieldInfo(const clang::FieldDecl *fd) const {
+    fd = fd->getCanonicalDecl();
+    assert(fd->isBitField() && "Invalid call for non-bit-field decl!");
+    llvm::DenseMap<const clang::FieldDecl *, CIRGenBitFieldInfo>::const_iterator
+        it = bitFields.find(fd);
+    assert(it != bitFields.end() && "Unable to find bitfield info");
+    return it->second;
+  }
 };
 
 } // namespace clang::CIRGen
diff --git a/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp b/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp
index 53aa0aee36fc3..76933952a9dee 100644
--- a/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp
@@ -20,6 +20,7 @@
 #include "clang/AST/RecordLayout.h"
 #include "clang/CIR/Dialect/IR/CIRAttrs.h"
 #include "clang/CIR/Dialect/IR/CIRDataLayout.h"
+#include "clang/CIR/MissingFeatures.h"
 #include "llvm/Support/Casting.h"
 
 #include <memory>
@@ -63,6 +64,10 @@ struct CIRRecordLowering final {
     return MemberInfo(offset, MemberInfo::InfoKind::Field, data);
   }
 
+  // Layout routines.
+  void setBitFieldInfo(const FieldDecl *fd, CharUnits startOffset,
+                       mlir::Type storageType);
+
   void lower();
   void lowerUnion();
 
@@ -72,6 +77,8 @@ struct CIRRecordLowering final {
   void insertPadding();
 
   void accumulateFields();
+  void accumulateBitFields(RecordDecl::field_iterator field,
+                           RecordDecl::field_iterator fieldEnd);
 
   CharUnits bitsToCharUnits(uint64_t bitOffset) {
     return astContext.toCharUnitsFromBits(bitOffset);
@@ -82,6 +89,9 @@ struct CIRRecordLowering final {
   CharUnits getSize(mlir::Type Ty) {
     return CharUnits::fromQuantity(dataLayout.layout.getTypeSize(Ty));
   }
+  CharUnits getSizeInBits(mlir::Type ty) {
+    return CharUnits::fromQuantity(dataLayout.layout.getTypeSizeInBits(ty));
+  }
   CharUnits getAlignment(mlir::Type Ty) {
     return CharUnits::fromQuantity(dataLayout.layout.getTypeABIAlignment(Ty));
   }
@@ -112,6 +122,18 @@ struct CIRRecordLowering final {
                : cir::ArrayType::get(type, numberOfChars.getQuantity());
   }
 
+  // This is different from LLVM traditional codegen because CIRGen uses arrays
+  // of bytes instead of arbitrary-sized integers. This is important for packed
+  // structures support.
+  mlir::Type getBitfieldStorageType(unsigned numBits) {
+    unsigned alignedBits = llvm::alignTo(numBits, astContext.getCharWidth());
+    if (cir::isValidFundamentalIntWidth(alignedBits))
+      return builder.getUIntNTy(alignedBits);
+
+    mlir::Type type = getCharType();
+    return cir::ArrayType::get(type, alignedBits / astContext.getCharWidth());
+  }
+
   mlir::Type getStorageType(const FieldDecl *fieldDecl) {
     mlir::Type type = cirGenTypes.convertTypeForMem(fieldDecl->getType());
     if (fieldDecl->isBitField()) {
@@ -144,6 +166,7 @@ struct CIRRecordLowering final {
   std::vector<MemberInfo> members;
   // Output fields, consumed by CIRGenTypes::computeRecordLayout
   llvm::SmallVector<mlir::Type, 16> fieldTypes;
+  llvm::DenseMap<const FieldDecl *, CIRGenBitFieldInfo> bitFields;
   llvm::DenseMap<const FieldDecl *, unsigned> fieldIdxMap;
   cir::CIRDataLayout dataLayout;
 
@@ -172,6 +195,32 @@ CIRRecordLowering::CIRRecordLowering(CIRGenTypes &cirGenTypes,
       zeroInitializable(true), zeroInitializableAsBase(true), packed(packed),
       padded(false) {}
 
+void CIRRecordLowering::setBitFieldInfo(const FieldDecl *fd,
+                                        CharUnits startOffset,
+                                        mlir::Type storageType) {
+  CIRGenBitFieldInfo &info = bitFields[fd->getCanonicalDecl()];
+  info.isSigned = fd->getType()->isSignedIntegerOrEnumerationType();
+  info.offset =
+      (unsigned)(getFieldBitOffset(fd) - astContext.toBits(startOffset));
+  info.size = fd->getBitWidthValue();
+  info.storageSize = getSizeInBits(storageType).getQuantity();
+  info.storageOffset = startOffset;
+  info.storageType = storageType;
+  info.name = fd->getName();
+
+  if (info.size > info.storageSize)
+    info.size = info.storageSize;
+  // Reverse the bit offsets for big endian machines. Because we represent
+  // a bitfield as a single large integer load, we can imagine the bits
+  // counting from the most-significant-bit instead of the
+  // least-significant-bit.
+  assert(!cir::MissingFeatures::isBigEndian());
+
+  info.volatileStorageSize = 0;
+  info.volatileOffset = 0;
+  info.volatileStorageOffset = CharUnits::Zero();
+}
+
 void CIRRecordLowering::lower() {
   if (recordDecl->isUnion()) {
     lowerUnion();
@@ -223,21 +272,114 @@ void CIRRecordLowering::fillOutputFields() {
             fieldTypes.size() - 1;
       // A field without storage must be a bitfield.
       assert(!cir::MissingFeatures::bitfields());
+      if (!member.data)
+        setBitFieldInfo(member.fieldDecl, member.offset, fieldTypes.back());
     }
     assert(!cir::MissingFeatures::cxxSupport());
   }
 }
 
+void CIRRecordLowering::accumulateBitFields(
+    RecordDecl::field_iterator field, RecordDecl::field_iterator fieldEnd) {
+  // Run stores the first element of the current run of bitfields.  FieldEnd is
+  // used as a special value to note that we don't have a current run.  A
+  // bitfield run is a contiguous collection of bitfields that can be stored in
+  // the same storage block.  Zero-sized bitfields and bitfields that would
+  // cross an alignment boundary break a run and start a new one.
+  RecordDecl::field_iterator run = fieldEnd;
+  // Tail is the offset of the first bit off the end of the current run.  It's
+  // used to determine if the ASTRecordLayout is treating these two bitfields as
+  // contiguous.  StartBitOffset is offset of the beginning of the Run.
+  uint64_t startBitOffset, tail = 0;
+  assert(!cir::MissingFeatures::isDiscreteBitFieldABI());
+
+  // Check if OffsetInRecord (the size in bits of the current run) is better
+  // as a single field run. When OffsetInRecord has legal integer width, and
+  // its bitfield offset is naturally aligned, it is better to make the
+  // bitfield a separate storage component so as it can be accessed directly
+  // with lower cost.
+  auto isBetterAsSingleFieldRun = [&](uint64_t offsetInRecord,
+                                      uint64_t startBitOffset,
+                                      uint64_t nextTail = 0) {
+    if (!cirGenTypes.getCGModule().getCodeGenOpts().FineGrainedBitfieldAccesses)
+      return false;
+    cirGenTypes.getCGModule().errorNYI(field->getSourceRange(),
+                                       "NYI FineGrainedBitfield");
+    return true;
+  };
+
+  // The start field is better as a single field run.
+  bool startFieldAsSingleRun = false;
+  for (;;) {
+    // Check to see if we need to start a new run.
+    if (run == fieldEnd) {
+      // If we're out of fields, return.
+      if (field == fieldEnd)
+        break;
+      // Any non-zero-length bitfield can start a new run.
+      if (!field->isZeroLengthBitField()) {
+        run = field;
+        startBitOffset = getFieldBitOffset(*field);
+        tail = startBitOffset + field->getBitWidthValue();
+        startFieldAsSingleRun =
+            isBetterAsSingleFieldRun(tail - startBitOffset, startBitOffset);
+      }
+      ++field;
+      continue;
+    }
+
+    // If the start field of a new run is better as a single run, or if current
+    // field (or consecutive fields) is better as a single run, or if current
+    // field has zero width bitfield and either UseZeroLengthBitfieldAlignment
+    // or UseBitFieldTypeAlignment is set to true, or if the offset of current
+    // field is inconsistent with the offset of previous field plus its offset,
+    // skip the block below and go ahead to emit the storage. Otherwise, try to
+    // add bitfields to the run.
+    uint64_t nextTail = tail;
+    if (field != fieldEnd)
+      nextTail += field->getBitWidthValue();
+
+    if (!startFieldAsSingleRun && field != fieldEnd &&
+        !isBetterAsSingleFieldRun(tail - startBitOffset, startBitOffset,
+                                  nextTail) &&
+        (!field->isZeroLengthBitField() ||
+         (!astContext.getTargetInfo().useZeroLengthBitfieldAlignment() &&
+          !astContext.getTargetInfo().useBitFieldTypeAlignment())) &&
+        tail == getFieldBitOffset(*field)) {
+      tail = nextTail;
+      ++field;
+      continue;
+    }
+
+    // We've hit a break-point in the run and need to emit a storage field.
+    mlir::Type type = getBitfieldStorageType(tail - startBitOffset);
+
+    // Add the storage member to the record and set the bitfield info for all of
+    // the bitfields in the run. Bitfields get the offset of their storage but
+    // come afterward and remain there after a stable sort.
+    members.push_back(makeStorageInfo(bitsToCharUnits(startBitOffset), type));
+    for (; run != field; ++run)
+      members.push_back(MemberInfo(bitsToCharUnits(startBitOffset),
+                                   MemberInfo::InfoKind::Field, nullptr, *run));
+    run = fieldEnd;
+    startFieldAsSingleRun = false;
+  }
+}
+
 void CIRRecordLowering::accumulateFields() {
-  for (const FieldDecl *field : recordDecl->fields()) {
+  for (RecordDecl::field_iterator field = recordDecl->field_begin(),
+                                  fieldEnd = recordDecl->field_end();
+       field != fieldEnd;) {
     if (field->isBitField()) {
-      cirGenTypes.getCGModule().errorNYI(recordDecl->getSourceRange(),
-                                         "accumulate bitfields");
-      ++field;
+      RecordDecl::field_iterator start = field;
+      // Iterate to gather the list of bitfields.
+      for (++field; field != fieldEnd && field->isBitField(); ++field)
+        ;
+      accumulateBitFields(start, field);
     } else if (!field->isZeroSize(astContext)) {
-      members.push_back(MemberInfo(bitsToCharUnits(getFieldBitOffset(field)),
+      members.push_back(MemberInfo(bitsToCharUnits(getFieldBitOffset(*field)),
                                    MemberInfo::InfoKind::Field,
-                                   getStorageType(field), field));
+                                   getStorageType(*field), *field));
       ++field;
     } else {
       // TODO(cir): do we want to do anything special about zero size members?
@@ -342,6 +484,8 @@ CIRGenTypes::computeRecordLayout(const RecordDecl *rd, cir::RecordType *ty) {
   // Add all the field numbers.
   rl->fieldIdxMap.swap(lowering.fieldIdxMap);
 
+  rl->bitFields.swap(lowering.bitFields);
+
   // Dump the layout, if requested.
   if (getASTContext().getLangOpts().DumpRecordLayouts) {
     cgm.errorNYI(rd->getSourceRange(), "computeRecordLayout: dump layout");
diff --git a/clang/test/CIR/CodeGen/bitfields.cpp b/clang/test/CIR/CodeGen/bitfields.cpp
new file mode 100644
index 0000000000000..2eedbcd4377e6
--- /dev/null
+++ b/clang/test/CIR/CodeGen/bitfields.cpp
@@ -0,0 +1,79 @@
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o %t.cir
+// RUN: FileCheck --input-file=%t.cir %s --check-prefix=CIR
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t-cir.ll
+// RUN: FileCheck --input-file=%t-cir.ll %s --check-prefix=LLVM
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o %t.ll
+// RUN: FileCheck --input-file=%t.ll %s --check-prefix=OGCG
+
+struct A {
+  char a, b, c;
+  unsigned bits : 3;
+  unsigned more_bits : 4;
+  unsigned still_more_bits : 7;
+};
+
+typedef struct {
+  int a : 4;
+  int b : 5;
+  int c;
+} D;
+
+typedef struct {
+  int a : 4;
+  int b : 27;
+  int c : 17;
+  int d : 2;
+  int e : 15;
+  unsigned f; // type other than int above, not a bitfield
+} S;
+
+typedef struct {
+  int a : 3;  // one bitfield with size < 8
+  unsigned b;
+} T;
+
+typedef struct {
+    char a;
+    char b;
+    char c;
+
+    // startOffset 24 bits, new storage from here
+    int d: 2;
+    int e: 2;
+    int f: 4;
+    int g: 25;
+    int h: 3;
+    int i: 4;
+    int j: 3;
+    int k: 8;
+
+    int l: 14; // need to be a part of the new storage
+               // because (tail - startOffset) is 65 after 'l' field
+} U;
+
+// CIR-DAG:  !rec_D = !cir.record<struct "D" {!u16i, !s32i}>
+// CIR-DAG:  !rec_T = !cir.record<struct "T" {!u8i, !u32i}>
+// CIR-DAG:  !rec_U = !cir.record<struct "U" {!s8i, !s8i, !s8i, !cir.array<!u8i x 9>}>
+// CIR-DAG:  !rec_A = !cir.record<struct "A" padded {!s8i, !s8i, !s8i, !u8i, !u8i, !cir.array<!u8i x 3>}>
+// CIR-DAG:  !rec_S = !cir.record<struct "S" {!u32i, !cir.array<!u8i x 3>, !u16i, !u32i}>
+
+
+// LLVM-DAG: %struct.D = type { i16, i32 }
+// LLVM-DAG: %struct.U = type { i8, i8, i8, [9 x i8] }
+// LLVM-DAG: %struct.A = type { i8, i8, i8, i8, i8, [3 x i8] }
+// LLVM-DAG: %struct.S = type { i32, [3 x i8], i16, i32 }
+// LLVM-DAG: %struct.T = type { i8, i32 }
+
+// OGCG-DAG: %struct.D = type { i16, i32 }
+// OGCG-DAG: %struct.U = type <{ i8, i8, i8, i8, i64 }>
+// OGCG-DAG: %struct.A = type <{ i8, i8, i8, i16, [3 x i8] }>
+// OGCG-DAG: %struct.S = type { i64, i16, i32 }
+// OGCG-DAG: %struct.T = type { i8, i32 }
+
+void def() {
+  A a;
+  D d;
+  S s;
+  T t;
+  U u;
+}

@llvmbot
Copy link
Member

llvmbot commented May 29, 2025

@llvm/pr-subscribers-clang

Author: None (Andres-Salamanca)

Changes

This change adds support for bitfields CIR records can now contain bit fields.

I’ve updated the CIRGenBitFieldInfo comment, which originally came from the incubator and was identical to the one in OGCodeGen, to better reflect the current implementation.

Support for bitfields in unions big-endian architectures and get and set operations remains unimplemented and will be addressed in a future patch.


Full diff: https://github.com/llvm/llvm-project/pull/142041.diff

4 Files Affected:

  • (modified) clang/include/clang/CIR/MissingFeatures.h (+2)
  • (modified) clang/lib/CIR/CodeGen/CIRGenRecordLayout.h (+114)
  • (modified) clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp (+150-6)
  • (added) clang/test/CIR/CodeGen/bitfields.cpp (+79)
diff --git a/clang/include/clang/CIR/MissingFeatures.h b/clang/include/clang/CIR/MissingFeatures.h
index fb205e9cd85d1..6d7e3998d2da8 100644
--- a/clang/include/clang/CIR/MissingFeatures.h
+++ b/clang/include/clang/CIR/MissingFeatures.h
@@ -134,6 +134,8 @@ struct MissingFeatures {
   static bool cxxSupport() { return false; }
   static bool recordZeroInit() { return false; }
   static bool zeroSizeRecordMembers() { return false; }
+  static bool isDiscreteBitFieldABI() { return false; }
+  static bool isBigEndian() { return false; }
 
   // Misc
   static bool cxxABI() { return false; }
diff --git a/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h b/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h
index 2ece85b8aa0a3..023f6bbaa47bb 100644
--- a/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h
+++ b/clang/lib/CIR/CodeGen/CIRGenRecordLayout.h
@@ -14,6 +14,106 @@
 
 namespace clang::CIRGen {
 
+/// Record with information about how a bitfield should be accessed. This is
+/// very similar to what LLVM codegen does, once CIR evolves it's possible we
+/// can use a more higher level representation.
+///
+/// Often we lay out a sequence of bitfields as a contiguous sequence of bits.
+/// When the AST record layout does this, we represent it in CIR as a
+/// `!cir.record` type, which directly reflects the structure's layout,
+/// including bitfield packing and padding, using CIR types such as
+/// `!cir.bool`, `!s8i`, `!u16i`.
+///
+/// To access a particular bitfield in CIR, we use the operations
+/// `cir.get_bitfield` (`GetBitfieldOp`) or `cir.set_bitfield`
+/// (`SetBitfieldOp`). These operations rely on the `bitfield_info`
+/// attribute, which provides detailed metadata required for access,
+/// such as the size and offset of the bitfield, the type and size of
+/// the underlying storage, and whether the value is signed.
+/// The CIRGenRecordLayout also has a bitFields map which encodes which
+/// byte-sequence this bitfield falls within. Let's assume the following C
+/// struct:
+///
+/// struct:
+///
+///   struct S {
+///     char a, b, c;
+///     unsigned bits : 3;
+///     unsigned more_bits : 4;
+///     unsigned still_more_bits : 7;
+///   };
+///
+/// This will end up as the following cir.record. The first array is the
+/// bitfield, and the second is the padding out to a 4-byte alignment.
+///
+///   !rec_S = !cir.record<struct "S" padded {!s8i, !s8i, !s8i, !u8i, !u8i,
+///   !cir.array<!u8i x 3>}>
+///
+/// When generating code to access more_bits, we'll generate something
+/// essentially like this:
+///
+///   #bfi_more_bits = #cir.bitfield_info<name = "more_bits", storage_type =
+///   !u16i, size = 4, offset = 3, is_signed = false>
+///
+///   cir.func @store_field() {
+///     %0 = cir.alloca !rec_S, !cir.ptr<!rec_S>, ["s"] {alignment = 4 : i64}
+///     %1 = cir.const #cir.int<2> : !s32i
+///     %2 = cir.cast(integral, %1 : !s32i), !u32i
+///     %3 = cir.get_member %0[4] {name = "more_bits"} : !cir.ptr<!rec_S> ->
+///     !cir.ptr<!u16i>
+///     %4 = cir.set_bitfield(#bfi_more_bits, %3 :
+///     !cir.ptr<!u16i>, %2 : !u32i) -> !u32i cir.return
+///   }
+////
+struct CIRGenBitFieldInfo {
+  /// The offset within a contiguous run of bitfields that are represented as
+  /// a single "field" within the cir.record type. This offset is in bits.
+  unsigned offset : 16;
+
+  /// The total size of the bit-field, in bits.
+  unsigned size : 15;
+
+  /// Whether the bit-field is signed.
+  unsigned isSigned : 1;
+
+  /// The storage size in bits which should be used when accessing this
+  /// bitfield.
+  unsigned storageSize;
+
+  /// The offset of the bitfield storage from the start of the record.
+  clang::CharUnits storageOffset;
+
+  /// The offset within a contiguous run of bitfields that are represented as a
+  /// single "field" within the cir.record type, taking into account the AAPCS
+  /// rules for volatile bitfields. This offset is in bits.
+  unsigned volatileOffset : 16;
+
+  /// The storage size in bits which should be used when accessing this
+  /// bitfield.
+  unsigned volatileStorageSize;
+
+  /// The offset of the bitfield storage from the start of the record.
+  clang::CharUnits volatileStorageOffset;
+
+  /// The name of a bitfield
+  llvm::StringRef name;
+
+  // The actual storage type for the bitfield
+  mlir::Type storageType;
+
+  CIRGenBitFieldInfo()
+      : offset(), size(), isSigned(), storageSize(), volatileOffset(),
+        volatileStorageSize() {}
+
+  CIRGenBitFieldInfo(unsigned offset, unsigned size, bool isSigned,
+                     unsigned storageSize, clang::CharUnits storageOffset)
+      : offset(offset), size(size), isSigned(isSigned),
+        storageSize(storageSize), storageOffset(storageOffset) {}
+
+  void print(llvm::raw_ostream &os) const;
+  void dump() const;
+};
+
 /// This class handles record and union layout info while lowering AST types
 /// to CIR types.
 ///
@@ -33,6 +133,10 @@ class CIRGenRecordLayout {
   /// field no. This info is populated by the record builder.
   llvm::DenseMap<const clang::FieldDecl *, unsigned> fieldIdxMap;
 
+  /// Map from (bit-field) record field to the corresponding CIR record type
+  /// field no. This info is populated by record builder.
+  llvm::DenseMap<const clang::FieldDecl *, CIRGenBitFieldInfo> bitFields;
+
   /// False if any direct or indirect subobject of this class, when considered
   /// as a complete object, requires a non-zero bitpattern when
   /// zero-initialized.
@@ -69,6 +173,16 @@ class CIRGenRecordLayout {
   /// Check whether this struct can be C++ zero-initialized
   /// with a zeroinitializer when considered as a base subobject.
   bool isZeroInitializableAsBase() const { return zeroInitializableAsBase; }
+
+  /// Return the BitFieldInfo that corresponds to the field FD.
+  const CIRGenBitFieldInfo &getBitFieldInfo(const clang::FieldDecl *fd) const {
+    fd = fd->getCanonicalDecl();
+    assert(fd->isBitField() && "Invalid call for non-bit-field decl!");
+    llvm::DenseMap<const clang::FieldDecl *, CIRGenBitFieldInfo>::const_iterator
+        it = bitFields.find(fd);
+    assert(it != bitFields.end() && "Unable to find bitfield info");
+    return it->second;
+  }
 };
 
 } // namespace clang::CIRGen
diff --git a/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp b/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp
index 53aa0aee36fc3..76933952a9dee 100644
--- a/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenRecordLayoutBuilder.cpp
@@ -20,6 +20,7 @@
 #include "clang/AST/RecordLayout.h"
 #include "clang/CIR/Dialect/IR/CIRAttrs.h"
 #include "clang/CIR/Dialect/IR/CIRDataLayout.h"
+#include "clang/CIR/MissingFeatures.h"
 #include "llvm/Support/Casting.h"
 
 #include <memory>
@@ -63,6 +64,10 @@ struct CIRRecordLowering final {
     return MemberInfo(offset, MemberInfo::InfoKind::Field, data);
   }
 
+  // Layout routines.
+  void setBitFieldInfo(const FieldDecl *fd, CharUnits startOffset,
+                       mlir::Type storageType);
+
   void lower();
   void lowerUnion();
 
@@ -72,6 +77,8 @@ struct CIRRecordLowering final {
   void insertPadding();
 
   void accumulateFields();
+  void accumulateBitFields(RecordDecl::field_iterator field,
+                           RecordDecl::field_iterator fieldEnd);
 
   CharUnits bitsToCharUnits(uint64_t bitOffset) {
     return astContext.toCharUnitsFromBits(bitOffset);
@@ -82,6 +89,9 @@ struct CIRRecordLowering final {
   CharUnits getSize(mlir::Type Ty) {
     return CharUnits::fromQuantity(dataLayout.layout.getTypeSize(Ty));
   }
+  CharUnits getSizeInBits(mlir::Type ty) {
+    return CharUnits::fromQuantity(dataLayout.layout.getTypeSizeInBits(ty));
+  }
   CharUnits getAlignment(mlir::Type Ty) {
     return CharUnits::fromQuantity(dataLayout.layout.getTypeABIAlignment(Ty));
   }
@@ -112,6 +122,18 @@ struct CIRRecordLowering final {
                : cir::ArrayType::get(type, numberOfChars.getQuantity());
   }
 
+  // This is different from LLVM traditional codegen because CIRGen uses arrays
+  // of bytes instead of arbitrary-sized integers. This is important for packed
+  // structures support.
+  mlir::Type getBitfieldStorageType(unsigned numBits) {
+    unsigned alignedBits = llvm::alignTo(numBits, astContext.getCharWidth());
+    if (cir::isValidFundamentalIntWidth(alignedBits))
+      return builder.getUIntNTy(alignedBits);
+
+    mlir::Type type = getCharType();
+    return cir::ArrayType::get(type, alignedBits / astContext.getCharWidth());
+  }
+
   mlir::Type getStorageType(const FieldDecl *fieldDecl) {
     mlir::Type type = cirGenTypes.convertTypeForMem(fieldDecl->getType());
     if (fieldDecl->isBitField()) {
@@ -144,6 +166,7 @@ struct CIRRecordLowering final {
   std::vector<MemberInfo> members;
   // Output fields, consumed by CIRGenTypes::computeRecordLayout
   llvm::SmallVector<mlir::Type, 16> fieldTypes;
+  llvm::DenseMap<const FieldDecl *, CIRGenBitFieldInfo> bitFields;
   llvm::DenseMap<const FieldDecl *, unsigned> fieldIdxMap;
   cir::CIRDataLayout dataLayout;
 
@@ -172,6 +195,32 @@ CIRRecordLowering::CIRRecordLowering(CIRGenTypes &cirGenTypes,
       zeroInitializable(true), zeroInitializableAsBase(true), packed(packed),
       padded(false) {}
 
+void CIRRecordLowering::setBitFieldInfo(const FieldDecl *fd,
+                                        CharUnits startOffset,
+                                        mlir::Type storageType) {
+  CIRGenBitFieldInfo &info = bitFields[fd->getCanonicalDecl()];
+  info.isSigned = fd->getType()->isSignedIntegerOrEnumerationType();
+  info.offset =
+      (unsigned)(getFieldBitOffset(fd) - astContext.toBits(startOffset));
+  info.size = fd->getBitWidthValue();
+  info.storageSize = getSizeInBits(storageType).getQuantity();
+  info.storageOffset = startOffset;
+  info.storageType = storageType;
+  info.name = fd->getName();
+
+  if (info.size > info.storageSize)
+    info.size = info.storageSize;
+  // Reverse the bit offsets for big endian machines. Because we represent
+  // a bitfield as a single large integer load, we can imagine the bits
+  // counting from the most-significant-bit instead of the
+  // least-significant-bit.
+  assert(!cir::MissingFeatures::isBigEndian());
+
+  info.volatileStorageSize = 0;
+  info.volatileOffset = 0;
+  info.volatileStorageOffset = CharUnits::Zero();
+}
+
 void CIRRecordLowering::lower() {
   if (recordDecl->isUnion()) {
     lowerUnion();
@@ -223,21 +272,114 @@ void CIRRecordLowering::fillOutputFields() {
             fieldTypes.size() - 1;
       // A field without storage must be a bitfield.
       assert(!cir::MissingFeatures::bitfields());
+      if (!member.data)
+        setBitFieldInfo(member.fieldDecl, member.offset, fieldTypes.back());
     }
     assert(!cir::MissingFeatures::cxxSupport());
   }
 }
 
+void CIRRecordLowering::accumulateBitFields(
+    RecordDecl::field_iterator field, RecordDecl::field_iterator fieldEnd) {
+  // Run stores the first element of the current run of bitfields.  FieldEnd is
+  // used as a special value to note that we don't have a current run.  A
+  // bitfield run is a contiguous collection of bitfields that can be stored in
+  // the same storage block.  Zero-sized bitfields and bitfields that would
+  // cross an alignment boundary break a run and start a new one.
+  RecordDecl::field_iterator run = fieldEnd;
+  // Tail is the offset of the first bit off the end of the current run.  It's
+  // used to determine if the ASTRecordLayout is treating these two bitfields as
+  // contiguous.  StartBitOffset is offset of the beginning of the Run.
+  uint64_t startBitOffset, tail = 0;
+  assert(!cir::MissingFeatures::isDiscreteBitFieldABI());
+
+  // Check if OffsetInRecord (the size in bits of the current run) is better
+  // as a single field run. When OffsetInRecord has legal integer width, and
+  // its bitfield offset is naturally aligned, it is better to make the
+  // bitfield a separate storage component so as it can be accessed directly
+  // with lower cost.
+  auto isBetterAsSingleFieldRun = [&](uint64_t offsetInRecord,
+                                      uint64_t startBitOffset,
+                                      uint64_t nextTail = 0) {
+    if (!cirGenTypes.getCGModule().getCodeGenOpts().FineGrainedBitfieldAccesses)
+      return false;
+    cirGenTypes.getCGModule().errorNYI(field->getSourceRange(),
+                                       "NYI FineGrainedBitfield");
+    return true;
+  };
+
+  // The start field is better as a single field run.
+  bool startFieldAsSingleRun = false;
+  for (;;) {
+    // Check to see if we need to start a new run.
+    if (run == fieldEnd) {
+      // If we're out of fields, return.
+      if (field == fieldEnd)
+        break;
+      // Any non-zero-length bitfield can start a new run.
+      if (!field->isZeroLengthBitField()) {
+        run = field;
+        startBitOffset = getFieldBitOffset(*field);
+        tail = startBitOffset + field->getBitWidthValue();
+        startFieldAsSingleRun =
+            isBetterAsSingleFieldRun(tail - startBitOffset, startBitOffset);
+      }
+      ++field;
+      continue;
+    }
+
+    // If the start field of a new run is better as a single run, or if current
+    // field (or consecutive fields) is better as a single run, or if current
+    // field has zero width bitfield and either UseZeroLengthBitfieldAlignment
+    // or UseBitFieldTypeAlignment is set to true, or if the offset of current
+    // field is inconsistent with the offset of previous field plus its offset,
+    // skip the block below and go ahead to emit the storage. Otherwise, try to
+    // add bitfields to the run.
+    uint64_t nextTail = tail;
+    if (field != fieldEnd)
+      nextTail += field->getBitWidthValue();
+
+    if (!startFieldAsSingleRun && field != fieldEnd &&
+        !isBetterAsSingleFieldRun(tail - startBitOffset, startBitOffset,
+                                  nextTail) &&
+        (!field->isZeroLengthBitField() ||
+         (!astContext.getTargetInfo().useZeroLengthBitfieldAlignment() &&
+          !astContext.getTargetInfo().useBitFieldTypeAlignment())) &&
+        tail == getFieldBitOffset(*field)) {
+      tail = nextTail;
+      ++field;
+      continue;
+    }
+
+    // We've hit a break-point in the run and need to emit a storage field.
+    mlir::Type type = getBitfieldStorageType(tail - startBitOffset);
+
+    // Add the storage member to the record and set the bitfield info for all of
+    // the bitfields in the run. Bitfields get the offset of their storage but
+    // come afterward and remain there after a stable sort.
+    members.push_back(makeStorageInfo(bitsToCharUnits(startBitOffset), type));
+    for (; run != field; ++run)
+      members.push_back(MemberInfo(bitsToCharUnits(startBitOffset),
+                                   MemberInfo::InfoKind::Field, nullptr, *run));
+    run = fieldEnd;
+    startFieldAsSingleRun = false;
+  }
+}
+
 void CIRRecordLowering::accumulateFields() {
-  for (const FieldDecl *field : recordDecl->fields()) {
+  for (RecordDecl::field_iterator field = recordDecl->field_begin(),
+                                  fieldEnd = recordDecl->field_end();
+       field != fieldEnd;) {
     if (field->isBitField()) {
-      cirGenTypes.getCGModule().errorNYI(recordDecl->getSourceRange(),
-                                         "accumulate bitfields");
-      ++field;
+      RecordDecl::field_iterator start = field;
+      // Iterate to gather the list of bitfields.
+      for (++field; field != fieldEnd && field->isBitField(); ++field)
+        ;
+      accumulateBitFields(start, field);
     } else if (!field->isZeroSize(astContext)) {
-      members.push_back(MemberInfo(bitsToCharUnits(getFieldBitOffset(field)),
+      members.push_back(MemberInfo(bitsToCharUnits(getFieldBitOffset(*field)),
                                    MemberInfo::InfoKind::Field,
-                                   getStorageType(field), field));
+                                   getStorageType(*field), *field));
       ++field;
     } else {
       // TODO(cir): do we want to do anything special about zero size members?
@@ -342,6 +484,8 @@ CIRGenTypes::computeRecordLayout(const RecordDecl *rd, cir::RecordType *ty) {
   // Add all the field numbers.
   rl->fieldIdxMap.swap(lowering.fieldIdxMap);
 
+  rl->bitFields.swap(lowering.bitFields);
+
   // Dump the layout, if requested.
   if (getASTContext().getLangOpts().DumpRecordLayouts) {
     cgm.errorNYI(rd->getSourceRange(), "computeRecordLayout: dump layout");
diff --git a/clang/test/CIR/CodeGen/bitfields.cpp b/clang/test/CIR/CodeGen/bitfields.cpp
new file mode 100644
index 0000000000000..2eedbcd4377e6
--- /dev/null
+++ b/clang/test/CIR/CodeGen/bitfields.cpp
@@ -0,0 +1,79 @@
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-cir %s -o %t.cir
+// RUN: FileCheck --input-file=%t.cir %s --check-prefix=CIR
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -fclangir -emit-llvm %s -o %t-cir.ll
+// RUN: FileCheck --input-file=%t-cir.ll %s --check-prefix=LLVM
+// RUN: %clang_cc1 -std=c++17 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o %t.ll
+// RUN: FileCheck --input-file=%t.ll %s --check-prefix=OGCG
+
+struct A {
+  char a, b, c;
+  unsigned bits : 3;
+  unsigned more_bits : 4;
+  unsigned still_more_bits : 7;
+};
+
+typedef struct {
+  int a : 4;
+  int b : 5;
+  int c;
+} D;
+
+typedef struct {
+  int a : 4;
+  int b : 27;
+  int c : 17;
+  int d : 2;
+  int e : 15;
+  unsigned f; // type other than int above, not a bitfield
+} S;
+
+typedef struct {
+  int a : 3;  // one bitfield with size < 8
+  unsigned b;
+} T;
+
+typedef struct {
+    char a;
+    char b;
+    char c;
+
+    // startOffset 24 bits, new storage from here
+    int d: 2;
+    int e: 2;
+    int f: 4;
+    int g: 25;
+    int h: 3;
+    int i: 4;
+    int j: 3;
+    int k: 8;
+
+    int l: 14; // need to be a part of the new storage
+               // because (tail - startOffset) is 65 after 'l' field
+} U;
+
+// CIR-DAG:  !rec_D = !cir.record<struct "D" {!u16i, !s32i}>
+// CIR-DAG:  !rec_T = !cir.record<struct "T" {!u8i, !u32i}>
+// CIR-DAG:  !rec_U = !cir.record<struct "U" {!s8i, !s8i, !s8i, !cir.array<!u8i x 9>}>
+// CIR-DAG:  !rec_A = !cir.record<struct "A" padded {!s8i, !s8i, !s8i, !u8i, !u8i, !cir.array<!u8i x 3>}>
+// CIR-DAG:  !rec_S = !cir.record<struct "S" {!u32i, !cir.array<!u8i x 3>, !u16i, !u32i}>
+
+
+// LLVM-DAG: %struct.D = type { i16, i32 }
+// LLVM-DAG: %struct.U = type { i8, i8, i8, [9 x i8] }
+// LLVM-DAG: %struct.A = type { i8, i8, i8, i8, i8, [3 x i8] }
+// LLVM-DAG: %struct.S = type { i32, [3 x i8], i16, i32 }
+// LLVM-DAG: %struct.T = type { i8, i32 }
+
+// OGCG-DAG: %struct.D = type { i16, i32 }
+// OGCG-DAG: %struct.U = type <{ i8, i8, i8, i8, i64 }>
+// OGCG-DAG: %struct.A = type <{ i8, i8, i8, i16, [3 x i8] }>
+// OGCG-DAG: %struct.S = type { i64, i16, i32 }
+// OGCG-DAG: %struct.T = type { i8, i32 }
+
+void def() {
+  A a;
+  D d;
+  S s;
+  T t;
+  U u;
+}

// because (tail - startOffset) is 65 after 'l' field
} U;

// CIR-DAG: !rec_D = !cir.record<struct "D" {!u16i, !s32i}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to move the checks for each type to just below the source declaration.

/// unsigned still_more_bits : 7;
/// };
///
/// This will end up as the following cir.record. The first array is the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to be stale. There's only one array generated, and the bitfield members are represented by two !u8i variables. This is what the incubator generates, but classic codegen generates an i16 for the bitfield members.

/// %1 = cir.const #cir.int<2> : !s32i
/// %2 = cir.cast(integral, %1 : !s32i), !u32i
/// %3 = cir.get_member %0[4] {name = "more_bits"} : !cir.ptr<!rec_S> ->
/// !cir.ptr<!u16i>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also stale, but differently so. I believe we generate a !u8i for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, when I was writing this, I was running some tests and forgot to revert the struct back to match what the comment was describing. Thanks for catching that

storageSize(storageSize), storageOffset(storageOffset) {}

void print(llvm::raw_ostream &os) const;
void dump() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void dump() const;
LLVM_DUMP_METHOD void dump() const;

This tells the linker not to discard this even though it isn't called.

// or UseBitFieldTypeAlignment is set to true, or if the offset of current
// field is inconsistent with the offset of previous field plus its offset,
// skip the block below and go ahead to emit the storage. Otherwise, try to
// add bitfields to the run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is really hard to follow. I would suggest an alternative, but I'm really not sure what it's saying. It seems to be a literal rendering of the compound condition on lines 342-348 without explaining what the goal of the conditions is.

RecordDecl::field_iterator run = fieldEnd;
// Tail is the offset of the first bit off the end of the current run. It's
// used to determine if the ASTRecordLayout is treating these two bitfields as
// contiguous. StartBitOffset is offset of the beginning of the Run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// contiguous. StartBitOffset is offset of the beginning of the Run.
// contiguous. 'startBitOffset' is offset of the beginning of the run.

uint64_t startBitOffset, tail = 0;
assert(!cir::MissingFeatures::isDiscreteBitFieldABI());

// Check if OffsetInRecord (the size in bits of the current run) is better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check if OffsetInRecord (the size in bits of the current run) is better
// Check if 'offsetInRecord' (the size in bits of the current run) is better

auto isBetterAsSingleFieldRun = [&](uint64_t offsetInRecord,
uint64_t startBitOffset,
uint64_t nextTail = 0) {
if (!cirGenTypes.getCGModule().getCodeGenOpts().FineGrainedBitfieldAccesses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suspicious about what's happening here. We don't support fine-grained bitfield access in yet in the incubator, but the structures we generate match what classic codegen generates when fine-grained bitfield access is enabled.

https://godbolt.org/z/Eh6bjWefh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we're not emitting the same layout as CodeGen or look like FineGrainedBitfieldAccesses is due to this condition:
tail == getFieldBitOffset(*field)

For example, given the following struct:

struct S {
  char a, b, c;
  unsigned bits : 2;
  unsigned more_bits : 5;
  unsigned still_more_bits : 7;
} a;

The data layout looks something like this:
imagen

Because of the way the layout is defined, the bit-fields are not contiguous in the ASTRecordLayout. As a result, CIR is not packing the three bit-fields into a single integer like CodeGen does.
CodeGen emits:

%struct.S = type <{ i8, i8, i8, i16, [3 x i8] }>

While CIR emits:

!rec_S = !cir.record<struct "S" padded {!s8i, !s8i, !s8i, !u8i, !u8i, !cir.array<!u8i x 3>}>

Should we consider changing the implementation here?
Alternatively, @andykaylor idea of representing it like this :

!cir.record<struct "S" {!s32i, !cir.bitfield<2>, !cir.bitfield<5>, !cir.bitfield<7>}>

We could then defer to the lowering phase

@bcardosolopes

return true;
};

// The start field is better as a single field run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is more confusing than it is helpful. I'd delete it.

};

// The start field is better as a single field run.
bool startFieldAsSingleRun = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that isBetterAsSingleFieldRun always returns false, this variable can be eliminated and a lot of the code here can be simplified. I suggest removing all that and replacing it with a MissingFeatures::nonFineGrainedBitfields assert.

@andykaylor
Copy link
Contributor

@bcardosolopes, I have some general concerns about the way that we're representing bitfields and bitfield accesses. It seems fundamentally inconsistent with the way we represent unions. For example:

union U {
  char a;
  int b;
  long c;
};

becomes

!cir.record<union "U" {!s8i, !s32i, !s64i}

but

struct S {
  int x;
  unsigned a:2;
  unsigned b:5;
  unsigned c:7;
} s;

becomes

!cir.record<struct "S" {!s32i, !u16i}

And all three bitfield members are hidden away inside the second member, leading to the need for two operations to access a bitfield member:

    %3 = cir.get_member %2[1] {name = "b"} : !cir.ptr<!rec_S> -> !cir.ptr<!u16i> loc(#loc11)
    %4 = cir.get_bitfield(#bfi_b, %3 : !cir.ptr<!u16i>) -> !u32i loc(#loc12)

Why are we representing each member of a union individually but packing bitfields away in integer fields?

What would you think of something like this?

!cir.record<struct "S" {!s32i, !cir.bitfield<2>, !cir.bitfield<5>, !cir.bitfield<7>}

That puts a bit more burden on the RecordType::getElementOffset implementation, but I think it would let us defer some of the details to lowering.

@bcardosolopes
Copy link
Member

bcardosolopes commented Jun 4, 2025

Why are we representing each member of a union individually but packing bitfields away in integer fields?

As you noticed, we went higher level for unions (kept the other types around) but didn't for bitfields. I don't see it as inconsistent because in my point of view they are orthogonal (even though you can compose them). Bitfields can get tricky, so we decided to go the "fast path" that would minimize differences between OG and CIRGen and be more like LLVM until we get a good use case for a richer representation (so that the design could incorporate real needs). Note that I'm not against improving it, but we just decided not to spend time there with the resources we had.

What would you think of something like this?

!cir.record<struct "S" {!s32i, !cir.bitfield<2>, !cir.bitfield<5>, !cir.bitfield<7>}

That puts a bit more burden on the RecordType::getElementOffset implementation, but I think it would let us defer some of the details to lowering.

I'm fine with something like this, thanks for pushing it. Good thing we already have good tests for bitfields, which might help a bunch with bulletproofing the new representation - so my suggestion would be to do it first in the incubator and leverage the existing LLVM output checks.

@gitoleg you have worked a lot on bitfields, do you have any feedback here or extra suggestions?

@gitoleg
Copy link
Contributor

gitoleg commented Jun 5, 2025

@bcardosolopes @andykaylor

Well, don't have too much to say, you highlighted almost everything - my intention was to generate LLVM IR closer to the original one.

The approach above is good - and may be it's indeed we need to get in the end of the day.

But there are several problems I think we'll face with:

  1. we need 64 types for bitfields. It's not a big problem, but better to keep it mind
  2. we'll need to generate a correct record layout during the lowering - with all the paddings, merging several bitfields into a single member and whatever else is currently happens in CIRGenRecordLayout. That might be a real problem - since I bet we don't have all the information in the lowering.

Andres-Salamanca added a commit that referenced this pull request Jun 5, 2025
This PR adds support for the DLTI dialect by attaching it to the module
attributes and introduces a utility function to determine if the target
is big-endian, which is required for
#142041. Some tests were
updated because we now use `mlir::translateDataLayout`, which "updates"
the `DataLayout` where the alignment for `long` is 8 instead of the
previously 4. This updated is consistent with Incubator.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jun 5, 2025
…241)

This PR adds support for the DLTI dialect by attaching it to the module
attributes and introduces a utility function to determine if the target
is big-endian, which is required for
llvm/llvm-project#142041. Some tests were
updated because we now use `mlir::translateDataLayout`, which "updates"
the `DataLayout` where the alignment for `long` is 8 instead of the
previously 4. This updated is consistent with Incubator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants