Skip to content

[SYCL] Avoid strict aliasing violation when reading from byte arrays #5537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions sycl/source/detail/device_image_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -272,27 +272,31 @@ class device_image_impl {
NumElements ==
0 &&
"unexpected layout of composite spec const descriptors");
auto *It = reinterpret_cast<const std::uint32_t *>(&Descriptors[8]);
auto *End = reinterpret_cast<const std::uint32_t *>(&Descriptors[0] +
Descriptors.size());
const uint8_t *It = &Descriptors[8];
Copy link
Contributor

@cperkinsintel cperkinsintel Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeviceBinaryProperty has a .asUint32() method. Couldn't we call that above, instead of .asByteArray() and then avoid the reinterpret cast on that front? (A: no. that's for a single uint32, not an array like the other )

Alternately, instead of inserting std::memcpy down below, I wonder if we could use sycl::bit_cast ?

I know the reinterpret cast for pointers means we potentially have UB, but given that we are getting raw binary data from the PI, are we really avoiding any issues by switching from reinterpet_cast to memcpy into local vars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, instead of inserting std::memcpy down below, I wonder if we could use sycl::bit_cast ?

I believe sycl::bit_cast asserts that To and From have the same size, so we can't use it here sadly.

I know the reinterpret cast for pointers means we potentially have UB, but given that we are getting raw binary data from the PI, are we really avoiding any issues by switching from reinterpet_cast to memcpy into local vars?

Currently there isn't must difference between the two, except we assume that the data sycl-post-link generates has the same structure as the types we use. If this starts failing it is on us. However, relying on UB means we are at the compiler's mercy and any day it could change. Do I think it's likely to happen? No, but it would likely be a huge headache if it ever does and making this change seems like a small price to avoid future pains. Better safe than sorry!

const uint8_t *End = &Descriptors[0] + Descriptors.size();
unsigned LocalOffset = 0;
while (It != End) {
// Read this descriptor.
std::uint32_t CurrentDesc[NumElements];
std::memcpy(CurrentDesc, It, NumElements * sizeof(std::uint32_t));

// Make sure that alignment is correct in blob.
const unsigned OffsetFromLast = /*Offset*/ It[1] - LocalOffset;
const unsigned OffsetFromLast =
/*Offset*/ CurrentDesc[1] - LocalOffset;
BlobOffset += OffsetFromLast;
// Composites may have a special padding element at the end which
// should not have a descriptor. These padding elements all have max
// ID value.
if (It[0] != std::numeric_limits<std::uint32_t>::max()) {
if (CurrentDesc[0] != std::numeric_limits<std::uint32_t>::max()) {
// The map is not locked here because updateSpecConstSymMap() is
// only supposed to be called from c'tor.
MSpecConstSymMap[std::string{SCName}].push_back(
SpecConstDescT{/*ID*/ It[0], /*CompositeOffset*/ It[1],
/*Size*/ It[2], BlobOffset});
MSpecConstSymMap[std::string{SCName}].push_back(SpecConstDescT{
/*ID*/ CurrentDesc[0], /*CompositeOffset*/ CurrentDesc[1],
/*Size*/ CurrentDesc[2], BlobOffset});
}
LocalOffset += OffsetFromLast + /*Size*/ It[2];
BlobOffset += /*Size*/ It[2];
It += NumElements;
LocalOffset += OffsetFromLast + /*Size*/ CurrentDesc[2];
BlobOffset += /*Size*/ CurrentDesc[2];
It += NumElements * sizeof(std::uint32_t);
}
}
MSpecConstsBlob.resize(BlobOffset);
Expand Down
17 changes: 10 additions & 7 deletions sycl/source/detail/program_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -548,16 +548,19 @@ void program_impl::flush_spec_constants(const RTDeviceBinaryImage &Img,
// (which might be a member of the composite); offset, which is used to
// calculate location of scalar member within the composite or zero for
// scalar spec constants; size of a spec constant
assert(((Descriptors.size() - 8) / sizeof(std::uint32_t)) % 3 == 0 &&
constexpr size_t NumElements = 3;
assert(((Descriptors.size() - 8) / sizeof(std::uint32_t)) % NumElements ==
0 &&
"unexpected layout of composite spec const descriptors");
auto *It = reinterpret_cast<const std::uint32_t *>(&Descriptors[8]);
auto *End = reinterpret_cast<const std::uint32_t *>(&Descriptors[0] +
Descriptors.size());
const uint8_t *It = &Descriptors[8];
const uint8_t *End = &Descriptors[0] + Descriptors.size();
while (It != End) {
std::uint32_t CurrentDesc[NumElements];
std::memcpy(CurrentDesc, It, NumElements * sizeof(std::uint32_t));
Ctx->getPlugin().call<PiApiKind::piextProgramSetSpecializationConstant>(
NativePrg, /* ID */ It[0], /* Size */ It[2],
SC.getValuePtr() + /* Offset */ It[1]);
It += 3;
NativePrg, /* ID */ CurrentDesc[0], /* Size */ CurrentDesc[2],
SC.getValuePtr() + /* Offset */ CurrentDesc[1]);
It += NumElements * sizeof(std::uint32_t);
}
}
}
Expand Down
16 changes: 8 additions & 8 deletions sycl/source/detail/program_manager/program_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -580,11 +580,11 @@ RT::PiProgram ProgramManager::getBuiltPIProgram(
continue;
ByteArray Aspects = DeviceBinaryProperty(*It).asByteArray();
// 8 because we need to skip 64-bits of size of the byte array
auto *AIt = reinterpret_cast<const std::uint32_t *>(&Aspects[8]);
auto *AEnd =
reinterpret_cast<const std::uint32_t *>(&Aspects[0] + Aspects.size());
const uint8_t *AIt = &Aspects[8];
const uint8_t *AEnd = &Aspects[0] + Aspects.size();
while (AIt != AEnd) {
auto Aspect = static_cast<aspect>(*AIt);
aspect Aspect;
std::memcpy(&Aspect, AIt, sizeof(aspect));
if (!Dev->has(Aspect))
throw sycl::exception(errc::kernel_not_supported,
"Required aspect " + getAspectNameStr(Aspect) +
Expand Down Expand Up @@ -1279,10 +1279,10 @@ void ProgramManager::addImages(pi_device_binaries DeviceBinary) {
// * 4 bytes - 0 if device_global has device_image_scope and any value
// otherwise.
assert(DeviceGlobalInfo.size() == 16 && "Unexpected property size");
const std::uint32_t TypeSize =
*reinterpret_cast<const std::uint32_t *>(&DeviceGlobalInfo[8]);
const std::uint32_t DeviceImageScopeDecorated =
*reinterpret_cast<const std::uint32_t *>(&DeviceGlobalInfo[12]);
std::uint32_t TypeSize, DeviceImageScopeDecorated;
std::memcpy(&TypeSize, &DeviceGlobalInfo[8], sizeof(std::uint32_t));
std::memcpy(&DeviceImageScopeDecorated, &DeviceGlobalInfo[12],
sizeof(std::uint32_t));

auto ExistingDeviceGlobal = m_DeviceGlobals.find(DeviceGlobal->Name);
if (ExistingDeviceGlobal != m_DeviceGlobals.end()) {
Expand Down