-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Crc32C Hardware Intrinsics on .NET Core 3 #16
Comments
Yep, I'll try to implement this ability. It is good idea, thank you. |
Support hardware accelerated in .NET Core 3.0 is declared in this realization, but I have not tested https://github.com/differentrain/Crc32cSharp |
I've got this working for CRC32C and ready to put in a PR as soon as #19 gets merged. In my testing, I'm seeing about a 6x performance improvement for 64-bit processes on a modern Intel processor (this is above an beyond the perf improvements in PR #19). About 6 microseconds to compute on a 64KB buffer.
|
@brantburnett Do you also have an implementation for CRC32? |
@Skyppid Unfortunately, the Intel intrinsic operation is specific to CRC32C, based on the polynomial 0x11EDC6F41 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=crc&expand=1288 There may be other ways to use intrinsics to optimize CRC32, but if so I haven't found them yet. |
@Skyppid Yep, only CRC32C can be hardware accelerated. |
Ah, okay didn't know. Well I saw that CRC32C is the recommended one anyway, so I switched to that. |
I did some digging, and I was able to find and implement a CRC32 algorithm using carryless multiplication intrinsics. It's not as big a difference as CRC32C, only about a 25% reduction in runtime instead of 85%, but it's still something. I may get it implemented more completely once the other stuff is merged. |
I'm using the CRC32 algorithm right now and it definitely takes a while for my use case (63,000 files, 25GB). I'll be switching to (software) CRC32C to see how much faster that is. I'm fascinated to see if I'll be able to switch to this hardware accelerated version and how much faster that will be. I'm using .NET 5, so I presume this will apply the same as .NET Core 3. I just got a new PC with a Core i9 chip, so I'm presuming that will have the Intel CRC32C hardware acceleration. If I wanted to run the same code on older PCs, is there any way to find out what chips support this hardware acceleration? Would a 3 year old i5 work? What about something like an Atom or Celeron? |
Yes, it will apply the same to .NET 5 as Core 3.1. As to processor support, the instructions are included in SSE4.2. This was first introduced on i7 chips starting around 2008. I don't have hard data, but seems like most Intel chips in the last few years support it. This is a list of processors: https://ark.intel.com/content/www/us/en/ark/search/featurefilter.html?productType=873&1_Filter-InstructionSetExtensions=3540 |
Looks like AMD chips also support SSE4.2. Does that mean that they support hardware accelerated CRC32C too? Everything I've read just talks about Intel chips. Sorry for the naive question, but what happens when you calculate the CRC32C of a file on a network share? Does the file have to be copied to the local machine to calculate the CRC? I don't imagine this is possible, but any chance the chip on the machine hosting the network share (e.g. a Synology) can calculate the CRC? |
I believe that AMD chips which have SSE4.2 will also automatically get the performance improvement, yes. But I'm not an expert. To my knowledge, to calculate the CRC over a network share you'd either need to run a service on that server and use HTTP or something similar to request the CRC or stream the whole file to your machine to calculate yourself. The only exception would be if there is some built-in support in the SMB protocol Windows uses for file shares. I have no clue on that front, though. |
I switched from CRC32 to CRC32C and the speed is the same. For me, about 58,000 milliseconds for 63,000 files. I'm really interested to see what I'll get with hardware acceleration when this is ready. 6 fold increase in speed would be amazing. |
@benwmills It seems, file reading is slower than CRC32 calculation. |
I'm writing some code to sync a folder to another folder. I can't always rely on file size and modification date, so I wanted to compare via some kind of file hash. I've previously used software (ViceVersa Pro) that uses CRC in these cases, but I'm also aware of other hashes like MD5 or SHA1. This is fairly new to me, so I don't know the pros and cons of CRC vs MD5 vs SHA1. I'm obviously looking for reliability, but speed is huge too. I took the lead from Vice Versa Pro to use CRC and it's working well. To be able to calculate the hash on 63,000 files in less than a minute is pretty impressive to me, but maybe there are better options. Sorry, I know this is getting off topic in this thread. The software CRC is working great for me. Just really intrigued by the possibility of much faster hardware CRC. |
Yeah, it off topic, but it ok to discuss it here :) As I understood, you check files for difference using comparing it hashes. In normal situation it is ok, because you also check size and time. But for specially prepared files collisions can exist which will lead to 'false file equality'. For example, hacker changes file content and changes last bytes to make required hash. Also, with file syncing operations - network speed and latency can be important. You can look at rsync algorithm which uses two hashes fast and linear and slow and accurate. It allows to find insertions and deletions in files and sync only small part of data. |
It is probably the case for x86/SSE4.2. But it looks like Arm does support both CRC32 and CRC32C (since ARMv8.1) as does System.Runtime.Intrinsics.Arm (since .NET 5): CRC32 and CRC32C. |
It looks like .NET Core 3 supports hardware intrinsics and has support for CRC32C:
https://github.com/dotnet/designs/blob/master/accepted/platform-intrinsics.md
_mm_crc32_u64
https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Sse42.cs
Would be great if the library could use this eventually when the instructions are present on the right platform.
The text was updated successfully, but these errors were encountered: