-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[String] Last-minute ABI adjustment: 4-bit discriminator #21310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
@swift-ci please benchmark |
Build failed |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
df24747
to
0f12e0f
Compare
@swift-ci please test |
Build failed |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplifications!
Trying an asserts build locally to repro. Seems to just be a few |
0f12e0f
to
1096140
Compare
@swift-ci please test |
Build failed |
Build failed |
@swift-ci please benchmark |
@swift-ci please test |
@lorentey can you take a look? In addition to the refactoring, this has ABI changes to only use 4-bit discriminators. |
Build failed |
Build failed |
stdlib/public/core/SmallString.swift
Outdated
self._storage.0, | ||
self._storage.1 & _StringObject.Nibbles.largeAddressMask) | ||
} | ||
return (self._storage.0, self._storage.1 & 0x00FF_FFFF_FFFF_FFFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Losing the nice explanatory name on the mask here makes me a bit sad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but that name was misleading and conflating. This is a mask only for the small-string contents, which no longer aligns with addresses.
I thought about adding a static property for this, which would be inlinable and in our ABI, and only used in this one place. I'm not sure how to weigh the "at least it has an name" with "one level of cognitive indirection" if there's only one use site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A local name could help readability here. let smallStringCodeUnitsMask = 0x00FF_FFFF_FFFF_FFFF
or whatever. An actual static property is probably overkill.
@@ -507,7 +510,7 @@ extension _StringStorage { | |||
@_effects(releasenone) | |||
private func _postRRCAdjust(newCount: Int, newIsASCII: Bool) { | |||
let countAndFlags = CountAndFlags( | |||
count: newCount, isASCII: newIsASCII) | |||
mortalCount: newCount, isASCII: newIsASCII) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"mortalCount" is the most metal variable name I've seen in a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤘
@@ -281,9 +281,13 @@ extension String { | |||
} | |||
} | |||
if _guts._object.isImmortal { | |||
// TODO: We'd rather emit a valid ObjC object statically than create a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a bug tracking this? I should either CC on it or steal it if so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do, then I assigned it to you ;-)
return _SharedStringStorage( | ||
immortal: _guts._object.fastUTF8.baseAddress!, | ||
countAndFlags: _guts._object._countAndFlags) | ||
countAndFlags: _StringObject.CountAndFlags( | ||
sharedCount: _guts.count, isASCII: gutsCountAndFlags.isASCII)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reasoning for making a new CountAndFlags here rather than using gutsCountAndFlags? Just Shared Strings needing some different flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared strings do not have tail-allocated contents, which is one of the new perf flags, so we need to strip it. I very slightly prefer using initializers than mutation, because it makes coding style more uniform for things like _invariantCheck()
not being called on temporarily-invalid states.
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
internal init(leading: UInt64, trailing: UInt64, count: Int) { | ||
_internalInvariant(count <= _SmallString.capacity) | ||
|
||
let isASCII = (leading | trailing) & 0x8080_8080_8080_8080 == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, nice. (I know it's not new; I just hadn't seen it before.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, given how fast this is, I wonder if the dedicated isASCII
bit is even needed, or if it's worth saving for something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that fast; we have to mask off the discriminator too. While ARM can encode the constant as an immediate, but it's still more code. Specifically for small strings, ASCII-ness is by far the most common and important characteristic to know.
The other contender for this bit would be remembering both isASCII and doesn't have CR+LF, i.e. single-scalar-grapheme. But, I think it's better to check that on Character access.
public func _forCodeGen_1() { if isASCII { foo() } else { bar() } }
public func _forCodeGen_2() { if computeIsASCII() { foo() } else { bar() } }
_$ss12_SmallStringV13_forCodeGen_1yyF:
tbnz x1, 0x3e, loc_15c474
b _$ss12_SmallStringV3baryyFTf4d_n
loc_15c474:
b _$ss12_SmallStringV3fooyyFTf4d_n
_$ss12_SmallStringV13_forCodeGen_2yyF:
orr x8, xzr, #0x8080808080808080
movk x8, #0x80, lsl #48
and x8, x1, x8
and x9, x0, #0x8080808080808080
orr x8, x8, x9
cbz x8, loc_15c494
b _$ss12_SmallStringV3baryyFTf4d_n
loc_15c494:
b _$ss12_SmallStringV3fooyyFTf4d_n
Considering that this check should probably be at the top of every String API inlined into user code, it's probably worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory that could be
orr x8, x1, x0, lsl #8
and x8, x8, #0x8080808080808080
cbz x8, loc
but I take your point.
@@ -64,17 +60,22 @@ internal struct _StringObject { | |||
as above, except bit 7 is omitted from storage -- it is left free, to supply | |||
extra inhabitants in the StringObject structure. The missing bit can be | |||
recovered by looking at `_variant.isImmortal`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the comment is no longer up to date, since there are not 7 bits.
NOTE: isNativelyStored is *specifically* allocated to b61 to align with the | ||
bit-position of isSmall, which allows us to check for native storage without | ||
an extra branch checking small-ness. Otherwise, these bits are invalid to read | ||
from small strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this bit. Aren't they in different words?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Flags are invalid to check on small strings, since small strings are using the bits for their stored code units. Before this change, we were able to tell if we had a natively managed string with a simple comparison of the top byte. But, moving this bit to the flags means we have to check whether we are small before checking the flag bit.
I.e. !isSmall && _countAndFlags.isNativelyStored
. This is another way of saying something like !bridgeObject[bit: 61] && _countAndFlags[bit: 61]
. Since both words are checking the same bit, we can do: (~bridgeObject & _countAndFlags)[bit: 61]
.
So, something like if _object.hasNativeStorage
compiles to a simple orn
and tbnz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Clever.
- `largeAddressBits` holds an instance of `_StringStorage`. | ||
isTailAllocated: start of the code units is at the stored address + `nativeBias` | ||
- `isNativelyStored` always implies `isTailAllocated`, but not vice versa | ||
(e.g. literals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not have a canonical form for "I point directly to unbiased string data" yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have anything that directly (unbiased) points to contents.
Timeouts and package failures. Going to try to repro locally, but in the mean time kick off another run. |
@swift-ci please test |
@swift-ci please benchmark |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
f0aa721
to
a77291a
Compare
@swift-ci please test |
@swift-ci please benchmark |
Build failed |
Build failed |
a77291a
to
26df40c
Compare
@swift-ci please test |
@swift-ci please benchmark |
Build failed |
Build failed |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Build failed |
So yeah, need to update LLDB |
Remove Discriminator, Flags, etc., abstractions from StringObject. These cause code divergence between 32-bit and 64-bit ABI, complicate ABI changes, and otherwise contribute to bloat.
In anticipation of potential future HW features, e.g. armv8.5 memory tagging, only use the high 4 bytes as discriminator bits in _BridgeObject rather than the top 8 bits. Utilize two perf flags to cover this instead. This requires shifting around a fair amount of internal complexity.
26df40c
to
5a6d2df
Compare
Please test with following pull request: @swift-ci please test |
Build failed |
Build failed |
Switch ABI to only use 4 discriminator bits.
In anticipation of potential future HW features, e.g. armv8.5 memory
tagging, only use the high 4 bytes as discriminator bits in
_BridgeObject rather than the top 8 bits. Utilize two perf flags to
cover this instead. This requires shifting around a fair amount of
internal complexity.
This includes a sizable refactoring of 32-bit support, making this kind of adjustment possible.