Name	Name	Last commit message	Last commit date
Latest commit History 10 Commits
unit_test	unit_test
.gitattributes	.gitattributes
.gitignore	.gitignore
README.md	README.md
cm256.cpp	cm256.cpp
cm256.h	cm256.h
gf256.cpp	gf256.cpp
gf256.h	gf256.h

cm256

Fast GF(256) Cauchy MDS Block Erasure Codec in C

cm256 is a simple library for erasure codes. From given data it generates redundant data that can be used to recover the originals. It is extremely fast, perhaps the fastest software implementation available for erasure codes < 256 blocks long. It is roughly 2x faster than Longhair, and CM256 supports input data that is not a multiple of 8 bytes.

Currently only Visual Studio 2013 is supported, though other versions of MSVC may work.

The original data should be split up into equally-sized chunks. If one of these chunks is erased, the redundant data can fill in the gap through decoding.

The erasure code is parameterized by three values (OriginalCount, RecoveryCount, BlockBytes). These are:

The number of blocks of original data (OriginalCount), which must be less than 256.
The number of blocks of redundant data (RecoveryCount), which must be no more than 256 - OriginalCount.

For example, if a file is split into 3 equal pieces and sent over a network, OriginalCount is 3. And if 2 additional redundant packets are generated, RecoveryCount is 2. In this case up to 256 - 3 = 253 additional redundant packets can be generated.

Building: Quick Setup

Include the cm256.* and gf256.* files in your project and consult the cm256.h header for usage.

Usage

Documentation is provided in the header file cm256.h.

When your application starts up it should call cm256_init() to verify that the library is linked properly:

	#include "cm256.h"

	if (cm256_init()) {
		// Wrong static library
		exit(1);
	}

To generate redundancy, use the cm256_encode function. To solve for the original data use the cm256_decode function.

Example usage:

bool ExampleFileUsage()
{
    if (cm256_init())
    {
        exit(1);
    }

    cm256_encoder_params params;

    // Number of bytes per file block
    params.BlockBytes = 4321;

    // Number of blocks
    params.OriginalCount = 33;

    // Number of additional recovery blocks generated by encoder
    params.RecoveryCount = 12;

    // Size of the original file
    static const int OriginalFileBytes = params.OriginalCount * params.BlockBytes;

    // Allocate and fill the original file data
    uint8_t* originalFileData = new uint8_t[OriginalFileBytes];
    memset(originalFileData, 1, OriginalFileBytes);

    // Pointers to data
    cm256_block blocks[256];
    for (int i = 0; i < params.OriginalCount; ++i)
    {
        blocks[i].Block = originalFileData + i * params.BlockBytes;
    }

    // Recovery data
    uint8_t* recoveryBlocks = new uint8_t[params.RecoveryCount * params.BlockBytes];

    // Generate recovery data
    if (cm256_encode(params, blocks, recoveryBlocks))
    {
        exit(1);
    }

    // Initialize the indices
    for (int i = 0; i < params.OriginalCount; ++i)
    {
        blocks[i].Index = cm256_get_original_block_index(params, i);
    }

    //// Simulate loss of data, subsituting a recovery block in its place ////
    blocks[0].Block = recoveryBlocks; // First recovery block
    blocks[0].Index = cm256_get_recovery_block_index(params, 0); // First recovery block index
    //// Simulate loss of data, subsituting a recovery block in its place ////

    if (cm256_decode(params, blocks))
    {
        exit(1);
    }

    // blocks[0].Index will now be 0.

    delete[] originalFileData;
    delete[] recoveryBlocks;

    return true;
}

The example above is just one way to use the cm256_decode function.

This API was designed to be flexible enough for UDP/IP-based file transfer where the blocks arrive out of order.

Benchmark

CM256 demonstrates similar encoding and (worst case) decoding performance:

Encoder: 1296 bytes k = 100 m = 1 : 6.14401 usec, 21093.7 MBps
Decoder: 1296 bytes k = 100 : 6.43658 usec, 20134.9 MBps
Encoder: 1296 bytes k = 100 m = 2 : 16.9692 usec, 7637.38 MBps
Decoder: 1296 bytes k = 100 : 19.8949 usec, 6514.24 MBps
Encoder: 1296 bytes k = 100 m = 3 : 29.2572 usec, 4429.68 MBps
Decoder: 1296 bytes k = 100 : 33.0606 usec, 3920.07 MBps
Encoder: 1296 bytes k = 100 m = 4 : 39.7898 usec, 3257.12 MBps
Decoder: 1296 bytes k = 100 : 40.9601 usec, 3164.06 MBps
Encoder: 1296 bytes k = 100 m = 5 : 51.7852 usec, 2502.64 MBps
Decoder: 1296 bytes k = 100 : 54.4184 usec, 2381.55 MBps
Encoder: 1296 bytes k = 100 m = 6 : 63.1955 usec, 2050.78 MBps
Decoder: 1296 bytes k = 100 : 64.6584 usec, 2004.38 MBps
Encoder: 1296 bytes k = 100 m = 7 : 73.7281 usec, 1757.81 MBps
Decoder: 1296 bytes k = 100 : 72.5578 usec, 1786.16 MBps
Encoder: 1296 bytes k = 100 m = 8 : 84.2607 usec, 1538.08 MBps
Decoder: 1296 bytes k = 100 : 86.0161 usec, 1506.69 MBps
Encoder: 1296 bytes k = 100 m = 9 : 96.5487 usec, 1342.33 MBps
Decoder: 1296 bytes k = 100 : 95.3784 usec, 1358.8 MBps
Encoder: 1296 bytes k = 100 m = 10 : 106.789 usec, 1213.61 MBps
Decoder: 1296 bytes k = 100 : 105.618 usec, 1227.06 MBps

Encoder: 1296 bytes k = 100 m = 20 : 219.429 usec, 590.624 MBps
Decoder: 1296 bytes k = 100 : 211.237 usec, 613.529 MBps

Encoder: 1296 bytes k = 100 m = 30 : 385.902 usec, 335.836 MBps
Decoder: 1296 bytes k = 100 : 322.999 usec, 401.239 MBps

Encoder: 1296 bytes k = 100 m = 40 : 444.709 usec, 291.426 MBps
Decoder: 1296 bytes k = 100 : 430.666 usec, 300.929 MBps

Encoder: 1296 bytes k = 100 m = 50 : 572.563 usec, 226.351 MBps
Decoder: 1296 bytes k = 100 : 545.646 usec, 237.516 MBps

Longhair library results. Note that I hand-optimized the MemXOR.cpp implementation on this PC to run faster than what is available on github, so this is a fair comparison:

Encoded k=100 data blocks with m=1 recovery blocks in 4.09607 usec : 31640.1 MB/s
+ Decoded 1 erasures in 5.85144 usec : 22148.4 MB/s
Encoded k=100 data blocks with m=2 recovery blocks in 41.5452 usec : 3119.5 MB/s
+ Decoded 2 erasures in 43.5931 usec : 2972.94 MB/s
Encoded k=100 data blocks with m=3 recovery blocks in 80.7498 usec : 1604.96 MB/s
+ Decoded 3 erasures in 86.6013 usec : 1496.51 MB/s
Encoded k=100 data blocks with m=4 recovery blocks in 123.465 usec : 1049.69 MB/s
+ Decoded 4 erasures in 127.854 usec : 1013.66 MB/s
Encoded k=100 data blocks with m=5 recovery blocks in 76.9464 usec : 1684.29 MB/s
+ Decoded 5 erasures in 88.6493 usec : 1461.94 MB/s
Encoded k=100 data blocks with m=6 recovery blocks in 87.7717 usec : 1476.56 MB/s
+ Decoded 6 erasures in 100.352 usec : 1291.45 MB/s
Encoded k=100 data blocks with m=7 recovery blocks in 103.863 usec : 1247.8 MB/s
+ Decoded 7 erasures in 127.269 usec : 1018.32 MB/s
Encoded k=100 data blocks with m=8 recovery blocks in 118.784 usec : 1091.05 MB/s
+ Decoded 8 erasures in 145.701 usec : 889.494 MB/s
Encoded k=100 data blocks with m=9 recovery blocks in 146.871 usec : 882.406 MB/s
+ Decoded 9 erasures in 158.574 usec : 817.284 MB/s
Encoded k=100 data blocks with m=10 recovery blocks in 156.819 usec : 826.433 MB/s
+ Decoded 10 erasures in 181.102 usec : 715.619 MB/s

Encoded k=100 data blocks with m=20 recovery blocks in 282.039 usec : 459.511 MB/s
+ Decoded 20 erasures in 370.103 usec : 350.172 MB/s

Encoded k=100 data blocks with m=30 recovery blocks in 428.618 usec : 302.367 MB/s
+ Decoded 30 erasures in 614.693 usec : 210.837 MB/s

Encoded k=100 data blocks with m=40 recovery blocks in 562.323 usec : 230.472 MB/s
+ Decoded 40 erasures in 855.188 usec : 151.546 MB/s

Encoded k=100 data blocks with m=50 recovery blocks in 727.041 usec : 178.257 MB/s
+ Decoded 50 erasures in 1181.11 usec : 109.727 MB/s

Results Discussion:

For m=1 they are both running the same kind of code, so they're basically the same.

For m=2 and m=3, CM256 is 2.5x faster.

For m=4, CM256 is 3x faster in this case. Longhair could use more tuning. Back when I wrote it, the right time to switch to the Windowed decoder was at m=5, but on my new PC it seems like m=4 is a better time to do it. CM256 only has one mode so it doesn't require any tuning for best performance.

For m=5...30, CM256 performance is not quite 2x faster, maybe 1.7x or so.

For m>30, CM256 is at least 2x faster.

Credits

This software was written entirely by myself ( Christopher A. Taylor mrcatid@gmail.com ). If you find it useful and would like to buy me a coffee, consider tipping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cm256

Building: Quick Setup

Usage

Benchmark

Credits

About

Releases

Packages

Contributors 4

Languages

License

catid/cm256

Folders and files

Latest commit

History

Repository files navigation

cm256

Building: Quick Setup

Usage

Benchmark

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages