Skip to content

feat: Fast buffer reader and fast buffer writer #1082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Sep 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
2fa0f6d
FastBufferWriter implemented and tested (still need to add and update…
ShadauxCat Aug 17, 2021
c36b3af
A few additional tests to cover the last missed cases I can think of …
ShadauxCat Aug 17, 2021
5ac5ef9
FastBufferReader + tests
ShadauxCat Aug 18, 2021
4776737
- More tests
ShadauxCat Aug 19, 2021
f245150
- Removed NativeArray from FastBufferReader and FastBufferWriter, rep…
ShadauxCat Aug 19, 2021
fc944c4
Added utility ref struct "Ref<T>" to more generically support wrappin…
ShadauxCat Aug 19, 2021
cb95862
BufferSerializer and tests.
ShadauxCat Aug 23, 2021
a3e0abc
Removed unnecessary comment.
ShadauxCat Aug 23, 2021
ea31f10
XMLDocs + cleanup for PR
ShadauxCat Aug 24, 2021
057c1fc
Merge branch 'develop' into feature/fast_buffer_reader_writer
ShadauxCat Aug 24, 2021
63309e2
Replaced possibly unaligned memory access with UnsafeUtility.MemCpy..…
ShadauxCat Aug 24, 2021
e9b9305
Resurrected BytewiseUtil.FastCopyBytes as a faster alternative to Uns…
ShadauxCat Aug 24, 2021
aa56440
Reverting an accidental change.
ShadauxCat Aug 24, 2021
fabb3b8
Removed files that got accidentally duplicated from before the rename.
ShadauxCat Aug 24, 2021
3a2a50b
Standards fixes
ShadauxCat Aug 24, 2021
58db1bd
Removed accidentally added files.
ShadauxCat Aug 24, 2021
c827339
Added BuildInfo.json to the .gitignore so I stop accidentally checkin…
ShadauxCat Aug 24, 2021
f8bfb2f
Addressed most of the review feedback. Still need to do a little more…
ShadauxCat Aug 26, 2021
9816b50
standards.py --fix
ShadauxCat Aug 26, 2021
a576552
standards.py --fix
ShadauxCat Aug 26, 2021
5193e39
Fixed incorrect namespaces.
ShadauxCat Aug 26, 2021
119ee2f
-Fixed a couple of issues where growing a FastBufferWriter wouldn't w…
ShadauxCat Aug 26, 2021
7fb2d53
Fix a test failure and better implementation of large growths
ShadauxCat Aug 27, 2021
8ee4610
- Removed RefArray
ShadauxCat Aug 31, 2021
f4b7ea9
Removed RefArray meta file that stuck around.
ShadauxCat Aug 31, 2021
d983f07
Review feedback: Used nameof() instead of string literal.
ShadauxCat Aug 31, 2021
bead9f5
-Removed BytewiseUtility.FastCopyBytes
ShadauxCat Sep 8, 2021
5063260
removed .gitignore.
ShadauxCat Sep 15, 2021
ade5d7c
Merge branch 'develop' into feature/fast_buffer_reader_writer
ShadauxCat Sep 15, 2021
2f01798
Fixed compile errors that somehow didn't happen on my machine until I…
ShadauxCat Sep 16, 2021
3da5b38
standards.py --fix ... again ... despite no changes since the last su…
ShadauxCat Sep 16, 2021
409d6fe
Merge branch 'develop' into feature/fast_buffer_reader_writer
ShadauxCat Sep 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions com.unity.netcode.gameobjects/Runtime/Serialization/BitCounter.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
using System.Runtime.CompilerServices;

namespace Unity.Netcode
{
public static class BitCounter
{
// Since we don't have access to BitOperations.LeadingZeroCount() (which would have been the fastest)
// we use the De Bruijn sequence to do this calculation
// See https://en.wikipedia.org/wiki/De_Bruijn_sequence and https://www.chessprogramming.org/De_Bruijn_Sequence
private const ulong k_DeBruijnMagic64 = 0x37E84A99DAE458F;
private const uint k_DeBruijnMagic32 = 0x06EB14F9;

// We're counting bytes, not bits, so these have all had the operation x/8 + 1 applied
private static readonly int[] k_DeBruijnTableBytes64 =
{
0/8+1, 1/8+1, 17/8+1, 2/8+1, 18/8+1, 50/8+1, 3/8+1, 57/8+1,
47/8+1, 19/8+1, 22/8+1, 51/8+1, 29/8+1, 4/8+1, 33/8+1, 58/8+1,
15/8+1, 48/8+1, 20/8+1, 27/8+1, 25/8+1, 23/8+1, 52/8+1, 41/8+1,
54/8+1, 30/8+1, 38/8+1, 5/8+1, 43/8+1, 34/8+1, 59/8+1, 8/8+1,
63/8+1, 16/8+1, 49/8+1, 56/8+1, 46/8+1, 21/8+1, 28/8+1, 32/8+1,
14/8+1, 26/8+1, 24/8+1, 40/8+1, 53/8+1, 37/8+1, 42/8+1, 7/8+1,
62/8+1, 55/8+1, 45/8+1, 31/8+1, 13/8+1, 39/8+1, 36/8+1, 6/8+1,
61/8+1, 44/8+1, 12/8+1, 35/8+1, 60/8+1, 11/8+1, 10/8+1, 9/8+1,
};

private static readonly int[] k_DeBruijnTableBytes32 =
{
0/8+1, 1/8+1, 16/8+1, 2/8+1, 29/8+1, 17/8+1, 3/8+1, 22/8+1,
30/8+1, 20/8+1, 18/8+1, 11/8+1, 13/8+1, 4/8+1, 7/8+1, 23/8+1,
31/8+1, 15/8+1, 28/8+1, 21/8+1, 19/8+1, 10/8+1, 12/8+1, 6/8+1,
14/8+1, 27/8+1, 9/8+1, 5/8+1, 26/8+1, 8/8+1, 25/8+1, 24/8+1,
};

// And here we're counting the number of set bits, not the position of the highest set,
// so these still have +1 applied - unfortunately 0 and 1 both return the same value.
private static readonly int[] k_DeBruijnTableBits64 =
{
0+1, 1+1, 17+1, 2+1, 18+1, 50+1, 3+1, 57+1,
47+1, 19+1, 22+1, 51+1, 29+1, 4+1, 33+1, 58+1,
15+1, 48+1, 20+1, 27+1, 25+1, 23+1, 52+1, 41+1,
54+1, 30+1, 38+1, 5+1, 43+1, 34+1, 59+1, 8+1,
63+1, 16+1, 49+1, 56+1, 46+1, 21+1, 28+1, 32+1,
14+1, 26+1, 24+1, 40+1, 53+1, 37+1, 42+1, 7+1,
62+1, 55+1, 45+1, 31+1, 13+1, 39+1, 36+1, 6+1,
61+1, 44+1, 12+1, 35+1, 60+1, 11+1, 10+1, 9+1,
};

private static readonly int[] k_DeBruijnTableBits32 =
{
0+1, 1+1, 16+1, 2+1, 29+1, 17+1, 3+1, 22+1,
30+1, 20+1, 18+1, 11+1, 13+1, 4+1, 7+1, 23+1,
31+1, 15+1, 28+1, 21+1, 19+1, 10+1, 12+1, 6+1,
14+1, 27+1, 9+1, 5+1, 26+1, 8+1, 25+1, 24+1,
};

/// <summary>
/// Get the minimum number of bytes required to represent the given value
/// </summary>
/// <param name="value">The value</param>
/// <returns>The number of bytes required</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetUsedByteCount(uint value)
{
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value = value & ~(value >> 1);
return k_DeBruijnTableBytes32[value * k_DeBruijnMagic32 >> 27];
}

/// <summary>
/// Get the minimum number of bytes required to represent the given value
/// </summary>
/// <param name="value">The value</param>
/// <returns>The number of bytes required</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetUsedByteCount(ulong value)
{
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value |= value >> 32;
value = value & ~(value >> 1);
return k_DeBruijnTableBytes64[value * k_DeBruijnMagic64 >> 58];
}

/// <summary>
/// Get the minimum number of bits required to represent the given value
/// </summary>
/// <param name="value">The value</param>
/// <returns>The number of bits required</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetUsedBitCount(uint value)
{
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value = value & ~(value >> 1);
return k_DeBruijnTableBits32[value * k_DeBruijnMagic32 >> 27];
}

/// <summary>
/// Get the minimum number of bits required to represent the given value
/// </summary>
/// <param name="value">The value</param>
/// <returns>The number of bits required</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int GetUsedBitCount(ulong value)
{
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value |= value >> 32;
value = value & ~(value >> 1);
return k_DeBruijnTableBits64[value * k_DeBruijnMagic64 >> 58];
}
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

218 changes: 218 additions & 0 deletions com.unity.netcode.gameobjects/Runtime/Serialization/BitReader.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
using System;
using System.Runtime.CompilerServices;
using Unity.Collections.LowLevel.Unsafe;

namespace Unity.Netcode
{
/// <summary>
/// Helper class for doing bitwise reads for a FastBufferReader.
/// Ensures all bitwise reads end on proper byte alignment so FastBufferReader doesn't have to be concerned
/// with misaligned reads.
/// </summary>
public ref struct BitReader
{
private Ref<FastBufferReader> m_Reader;
private readonly unsafe byte* m_BufferPointer;
private readonly int m_Position;
private int m_BitPosition;
#if DEVELOPMENT_BUILD || UNITY_EDITOR
private int m_AllowedBitwiseReadMark;
#endif

private const int k_BitsPerByte = 8;

/// <summary>
/// Whether or not the current BitPosition is evenly divisible by 8. I.e. whether or not the BitPosition is at a byte boundary.
/// </summary>
public bool BitAligned
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => (m_BitPosition & 7) == 0;
}

internal unsafe BitReader(ref FastBufferReader reader)
{
m_Reader = new Ref<FastBufferReader>(ref reader);

m_BufferPointer = m_Reader.Value.BufferPointer + m_Reader.Value.Position;
m_Position = m_Reader.Value.Position;
m_BitPosition = 0;
#if DEVELOPMENT_BUILD || UNITY_EDITOR
m_AllowedBitwiseReadMark = (m_Reader.Value.AllowedReadMark - m_Position) * k_BitsPerByte;
#endif
}

/// <summary>
/// Pads the read bit count to byte alignment and commits the read back to the reader
/// </summary>
public void Dispose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to implement IDisposable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref structs can't implement interfaces, but c# has special behavior to support using() on ref structs that have a public void Dispose()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, neat. ref structs are pretty new to me. I'm a little uneasy about some of your changes in general because I haven't dipped into some of the new ref syntax - especially the Ref type you're adding. I hope you're seeking review from some folks that have a deep experience with this since this type of code is very risky

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, not saying you shouldn't be doing it, serialization is exactly where I'd expect to be writing this type of code)

{
var bytesWritten = m_BitPosition >> 3;
if (!BitAligned)
{
// Accounting for the partial read
++bytesWritten;
}

m_Reader.Value.CommitBitwiseReads(bytesWritten);
}

/// <summary>
/// Verifies the requested bit count can be read from the buffer.
/// This exists as a separate method to allow multiple bit reads to be bounds checked with a single call.
/// If it returns false, you may not read, and in editor and development builds, attempting to do so will
/// throw an exception. In release builds, attempting to do so will read junk memory.
/// </summary>
/// <param name="bitCount">Number of bits you want to read, in total</param>
/// <returns>True if you can read, false if that would exceed buffer bounds</returns>
public bool TryBeginReadBits(uint bitCount)
{
var newBitPosition = m_BitPosition + bitCount;
var totalBytesWrittenInBitwiseContext = newBitPosition >> 3;
if ((newBitPosition & 7) != 0)
{
// Accounting for the partial read
++totalBytesWrittenInBitwiseContext;
}

if (m_Reader.Value.PositionInternal + totalBytesWrittenInBitwiseContext > m_Reader.Value.LengthInternal)
{
return false;
}
#if DEVELOPMENT_BUILD || UNITY_EDITOR
m_AllowedBitwiseReadMark = (int)newBitPosition;
#endif
return true;
}

/// <summary>
/// Read a certain amount of bits from the stream.
/// </summary>
/// <param name="value">Value to store bits into.</param>
/// <param name="bitCount">Amount of bits to read</param>
public unsafe void ReadBits(out ulong value, uint bitCount)
{
#if DEVELOPMENT_BUILD || UNITY_EDITOR
if (bitCount > 64)
{
throw new ArgumentOutOfRangeException(nameof(bitCount), "Cannot read more than 64 bits from a 64-bit value!");
}

if (bitCount < 0)
{
throw new ArgumentOutOfRangeException(nameof(bitCount), "Cannot read fewer than 0 bits!");
}

int checkPos = (int)(m_BitPosition + bitCount);
if (checkPos > m_AllowedBitwiseReadMark)
{
throw new OverflowException($"Attempted to read without first calling {nameof(TryBeginReadBits)}()");
}
#endif
ulong val = 0;

int wholeBytes = (int)bitCount / k_BitsPerByte;
byte* asBytes = (byte*)&val;
if (BitAligned)
{
if (wholeBytes != 0)
{
ReadPartialValue(out val, wholeBytes);
}
}
else
{
for (var i = 0; i < wholeBytes; ++i)
{
ReadMisaligned(out asBytes[i]);
}
}

val |= (ulong)ReadByteBits((int)bitCount & 7) << ((int)bitCount & ~7);
value = val;
}

/// <summary>
/// Read bits from stream.
/// </summary>
/// <param name="value">Value to store bits into.</param>
/// <param name="bitCount">Amount of bits to read.</param>
public void ReadBits(out byte value, uint bitCount)
{
#if DEVELOPMENT_BUILD || UNITY_EDITOR
int checkPos = (int)(m_BitPosition + bitCount);
if (checkPos > m_AllowedBitwiseReadMark)
{
throw new OverflowException($"Attempted to read without first calling {nameof(TryBeginReadBits)}()");
}
#endif
value = ReadByteBits((int)bitCount);
}

/// <summary>
/// Read a single bit from the buffer
/// </summary>
/// <param name="bit">Out value of the bit. True represents 1, False represents 0</param>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public unsafe void ReadBit(out bool bit)
{
#if DEVELOPMENT_BUILD || UNITY_EDITOR
int checkPos = (m_BitPosition + 1);
if (checkPos > m_AllowedBitwiseReadMark)
{
throw new OverflowException($"Attempted to read without first calling {nameof(TryBeginReadBits)}()");
}
#endif

int offset = m_BitPosition & 7;
int pos = m_BitPosition >> 3;
bit = (m_BufferPointer[pos] & (1 << offset)) != 0;
++m_BitPosition;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private unsafe void ReadPartialValue<T>(out T value, int bytesToRead, int offsetBytes = 0) where T : unmanaged
{
var val = new T();
byte* ptr = ((byte*)&val) + offsetBytes;
byte* bufferPointer = m_BufferPointer + m_Position;
UnsafeUtility.MemCpy(ptr, bufferPointer, bytesToRead);

m_BitPosition += bytesToRead * k_BitsPerByte;
value = val;
}

private byte ReadByteBits(int bitCount)
{
if (bitCount > 8)
{
throw new ArgumentOutOfRangeException(nameof(bitCount), "Cannot read more than 8 bits into an 8-bit value!");
}

if (bitCount < 0)
{
throw new ArgumentOutOfRangeException(nameof(bitCount), "Cannot read fewer than 0 bits!");
}

int result = 0;
var convert = new ByteBool();
for (int i = 0; i < bitCount; ++i)
{
ReadBit(out bool bit);
result |= convert.Collapse(bit) << i;
}

return (byte)result;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure how much we gain from aggressive inlining attributes like these — do you have any metrics or comparison?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't earth-shattering but it was significant. Like 20-30% gain IIRC. But I'll rerun those tests tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20-30%

:O

Copy link
Collaborator Author

@ShadauxCat ShadauxCat Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the tests. Tested both aggressive inlining and branching.

Editor:

Inlined/No Branch: 18,707,485 ticks
Not Inlined/No Branch: 19,292,008 ticks (+3.12%)
Inlined/With Branch: 24,172,988 ticks (+29.21%)
Not Inlined/With Branch: 24,214,432 ticks (+29.43%)

Player:

Inlined/No Branch: 3,940,828 ticks
Not Inlined/No Branch: 5,748,679 ticks (+45.87%)
Inlined/With Branch: 4,384,911 ticks (+11.2%)
Not Inlined/With Branch: 7,607,113 ticks (+93.03%)

IL2CPP:

Inlined/No Branch: 2,144,231 ticks
Not Inlined/No Branch: 3,316,924 ticks (+54.69%)
Inlined/With Branch: 3,770,607 ticks (+75.84%)
Not Inlined/With Branch: 3,719,728 ticks (73.47%)

I think player and il2cpp numbers matter more than editor, and I'd call them significant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the same of completeness can we add the code used to get these perf numbers into the test project ? Might just be a good reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where and how should this kind of code be added to the test project, given that it's not really an automated test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have code in the test project that are manual tests... I mean its a basic unity project we have all sort of things in there its not a project restricted to just automated testing. We do also use it as the project for that when running package tests because it just makes the package ci process easier.

private unsafe void ReadMisaligned(out byte value)
{
int off = m_BitPosition & 7;
int pos = m_BitPosition >> 3;
int shift1 = 8 - off;

value = (byte)((m_BufferPointer[pos] >> shift1) | (m_BufferPointer[(m_BitPosition += 8) >> 3] << shift1));
}
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading