Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Crc32C Hardware Intrinsics on .NET Core 3 #16

Open
nitinag opened this issue Jul 31, 2019 · 17 comments · May be fixed by #23
Open

Support Crc32C Hardware Intrinsics on .NET Core 3 #16

nitinag opened this issue Jul 31, 2019 · 17 comments · May be fixed by #23

Comments

@nitinag
Copy link

nitinag commented Jul 31, 2019

It looks like .NET Core 3 supports hardware intrinsics and has support for CRC32C:
https://github.com/dotnet/designs/blob/master/accepted/platform-intrinsics.md

_mm_crc32_u64
https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Sse42.cs

Would be great if the library could use this eventually when the instructions are present on the right platform.

@force-net
Copy link
Owner

force-net commented Aug 1, 2019

Yep, I'll try to implement this ability. It is good idea, thank you.

@Agagamand
Copy link

Support hardware accelerated in .NET Core 3.0 is declared in this realization, but I have not tested https://github.com/differentrain/Crc32cSharp

@brantburnett
Copy link

I've got this working for CRC32C and ready to put in a PR as soon as #19 gets merged. In my testing, I'm seeing about a 6x performance improvement for 64-bit processes on a modern Intel processor (this is above an beyond the perf improvements in PR #19). About 6 microseconds to compute on a 64KB buffer.

Method Runtime Mean Error StdDev Ratio Rank
Default .NET Core 2.1 39.789 us 0.6658 us 0.9333 us 1.00 2
Default .NET Core 3.1 6.017 us 0.0502 us 0.0469 us 0.15 1

@Skyppid
Copy link

Skyppid commented Nov 10, 2020

@brantburnett Do you also have an implementation for CRC32?

@brantburnett
Copy link

@Skyppid Unfortunately, the Intel intrinsic operation is specific to CRC32C, based on the polynomial 0x11EDC6F41

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=crc&expand=1288

There may be other ways to use intrinsics to optimize CRC32, but if so I haven't found them yet.

@force-net
Copy link
Owner

@Skyppid Yep, only CRC32C can be hardware accelerated.

@Skyppid
Copy link

Skyppid commented Nov 10, 2020

Ah, okay didn't know. Well I saw that CRC32C is the recommended one anyway, so I switched to that.

@brantburnett
Copy link

I did some digging, and I was able to find and implement a CRC32 algorithm using carryless multiplication intrinsics. It's not as big a difference as CRC32C, only about a 25% reduction in runtime instead of 85%, but it's still something. I may get it implemented more completely once the other stuff is merged.

@benwmills
Copy link

I'm using the CRC32 algorithm right now and it definitely takes a while for my use case (63,000 files, 25GB). I'll be switching to (software) CRC32C to see how much faster that is. I'm fascinated to see if I'll be able to switch to this hardware accelerated version and how much faster that will be.

I'm using .NET 5, so I presume this will apply the same as .NET Core 3.

I just got a new PC with a Core i9 chip, so I'm presuming that will have the Intel CRC32C hardware acceleration. If I wanted to run the same code on older PCs, is there any way to find out what chips support this hardware acceleration? Would a 3 year old i5 work? What about something like an Atom or Celeron?

@brantburnett
Copy link

@benwmills

Yes, it will apply the same to .NET 5 as Core 3.1. As to processor support, the instructions are included in SSE4.2. This was first introduced on i7 chips starting around 2008.

I don't have hard data, but seems like most Intel chips in the last few years support it. This is a list of processors: https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&1_Filter-InstructionSetExtensions=3540

@benwmills
Copy link

Looks like AMD chips also support SSE4.2. Does that mean that they support hardware accelerated CRC32C too? Everything I've read just talks about Intel chips.

Sorry for the naive question, but what happens when you calculate the CRC32C of a file on a network share? Does the file have to be copied to the local machine to calculate the CRC? I don't imagine this is possible, but any chance the chip on the machine hosting the network share (e.g. a Synology) can calculate the CRC?

@brantburnett
Copy link

I believe that AMD chips which have SSE4.2 will also automatically get the performance improvement, yes. But I'm not an expert.

To my knowledge, to calculate the CRC over a network share you'd either need to run a service on that server and use HTTP or something similar to request the CRC or stream the whole file to your machine to calculate yourself. The only exception would be if there is some built-in support in the SMB protocol Windows uses for file shares. I have no clue on that front, though.

@benwmills
Copy link

I switched from CRC32 to CRC32C and the speed is the same. For me, about 58,000 milliseconds for 63,000 files.

I'm really interested to see what I'll get with hardware acceleration when this is ready. 6 fold increase in speed would be amazing.

@force-net
Copy link
Owner

@benwmills It seems, file reading is slower than CRC32 calculation.
Also, it is not good idea to use CRC32 for checking file integrity. SHA1 or something is better. CRC checksums are good for relatively small block of data, primarily for network transfer.

@benwmills
Copy link

I'm writing some code to sync a folder to another folder. I can't always rely on file size and modification date, so I wanted to compare via some kind of file hash.

I've previously used software (ViceVersa Pro) that uses CRC in these cases, but I'm also aware of other hashes like MD5 or SHA1. This is fairly new to me, so I don't know the pros and cons of CRC vs MD5 vs SHA1. I'm obviously looking for reliability, but speed is huge too. I took the lead from Vice Versa Pro to use CRC and it's working well. To be able to calculate the hash on 63,000 files in less than a minute is pretty impressive to me, but maybe there are better options.

Sorry, I know this is getting off topic in this thread. The software CRC is working great for me. Just really intrigued by the possibility of much faster hardware CRC.

@force-net
Copy link
Owner

Yeah, it off topic, but it ok to discuss it here :)

As I understood, you check files for difference using comparing it hashes. In normal situation it is ok, because you also check size and time. But for specially prepared files collisions can exist which will lead to 'false file equality'. For example, hacker changes file content and changes last bytes to make required hash.
But in reality, lot of file syncing utilities do not look at content. Just date, size and name. And it works good.

Also, with file syncing operations - network speed and latency can be important. You can look at rsync algorithm which uses two hashes fast and linear and slow and accurate. It allows to find insertions and deletions in files and sync only small part of data.

@EduardSergeev
Copy link

Yep, only CRC32C can be hardware accelerated.

It is probably the case for x86/SSE4.2. But it looks like Arm does support both CRC32 and CRC32C (since ARMv8.1) as does System.Runtime.Intrinsics.Arm (since .NET 5): CRC32 and CRC32C.

@EduardSergeev EduardSergeev linked a pull request Aug 27, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants