Skip to content

[API Proposal]: Simple one-shot hashing methods that produce the relevant type #76279

Closed
@stephentoub

Description

@stephentoub

Background and motivation

System.IO.Hashing contains types for Crc32, Crc64, XxHash32, and XxHash64, and XxHash3 is being added. These types all have one-shot static methods for performing hashes over ReadOnlySpan<byte>, but all of these methods either allocate a resulting byte[] or write the result into a Span<byte>. However, the natural data types expecting in many use cases is just an int/uint for 32-bit or long/ulong for 64-bit hashes. We should add helpers for this.

API Proposal

namespace System.IO.Hashing;

public sealed class Crc32 : NonCryptographicHashAlgorithm
{
+   public static int HashToInt32(ReadOnlySpan<byte> source);
+   public int GetCurrentHashAsInt32();
    ...
}

public sealed class XxHash32 : NonCryptographicHashAlgorithm
{
+   public static int HashToInt32(ReadOnlySpan<byte> source, int seed = 0);
+   public int GetCurrentHashAsInt32();
    ...
}

public sealed class Crc64 : NonCryptographicHashAlgorithm
{
+   public static long HashToInt64(ReadOnlySpan<byte> source);
+   public long GetCurrentHashAsInt64();
    ...
}

public sealed class XxHash64 : NonCryptographicHashAlgorithm
{
+   public static long HashToInt64(ReadOnlySpan<byte> source, long seed = 0);
+   public long GetCurrentHashAsInt64();
    ...
}

public sealed class XxHash3 : NonCryptographicHashAlgorithm
{
+   public static long HashToInt64(ReadOnlySpan<byte> source, long seed = 0);
+   public long GetCurrentHashAsInt64();
    ...
}

public sealed class XxHash128 : NonCryptographicHashAlgorithm // assuming https://github.com/dotnet/runtime/issues/77885
{
+   public static Int128 HashToInt128(ReadOnlySpan<byte> source, long seed = 0);
+   public Int128 GetCurrentHashAsInt128();
    ...
}
  • All of these implementations already compute the relevant int/long and then write it to either a newly-allocated array or the destination span. We'd just return it instead from the new method.
  • The existing methods write the value to the destination as big-endian. The int/long-returning methods would simply return the value in machine-endianness.
  • Should we use unsigned types instead? All of the algorithms internally operate in unsigned land.

API Usage

ReadOnlySpan<byte> data = ...;
int hash = XxHash32.HashToInt32(data);

Alternative Designs

No response

Risks

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions