GitHub - vipshek/Fountain-Codes: An implementation of a Luby transform (LT) code written in OCaml

vipshek / Fountain-Codes Public

forked from rlucioni/Fountain-Codes

Notifications You must be signed in to change notification settings
Fork 0
Star 0

An implementation of a Luby transform (LT) code written in OCaml

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
Makefile		Makefile
README		README
bear.gif		bear.gif
distribution.ml		distribution.ml
droplet.ml		droplet.ml
fountain.ml		fountain.ml
goblet.ml		goblet.ml
image_test.sh		image_test.sh
lenna.jpg		lenna.jpg
md5test.sh		md5test.sh
operations.ml		operations.ml
prince_ch1.txt		prince_ch1.txt
probability.ml		probability.ml
sentence.txt		sentence.txt

Repository files navigation

README

CS 51 FINAL PROJECT: FOUNTAIN CODES

This is an OCaml implementation of a Luby transform (LT) code, the first class 
of practical fountain codes that are near optimal erasure correcting codes.

The LT code was invented by Michael Luby in 1998 and published in 2002.

The distinguishing characteristic of LT codes is in employing an encoding 
algorithm based on the XOR (exclusive or) operation to encode and decode a 
message. LT codes are called "rateless" because the encoding algorithm can in 
principle produce an infinite number of message packets, which for the purposes 
of this project have been named "droplets."

********************************************************************************

FILE MANIFEST
    droplet.ml     - defines the Droplet, containing a seed, the XOR'd data, 
                     and the total number of pieces
    
    fountain.ml    - defines the Fountain, produces Droplets according to the 
                     fountain code implementation chosen
    
    goblet.ml      - defines a Goblet, used to collect Droplets and reconstruct 
                     the original data. This takes a droplet (seed, the data to 
                     be decoded, and the total number of pieces)

    operations.ml  - allows encoding and decoding of an arbitrary file 
                     (message); the file is passed as an argument along with an 
                     output destination (like test_dump), a piece size, and a 
                     maximum number of pieces; compile to run from the command 
                     line

    probability.ml - takes a filename as a command line argument, then 
                     initializes and uses a fountain and goblet with uniform, 
                     poisson, or normal distribution (see distribution.ml below)
                     then takes additional arguments based on the probability
                     distribution selected; also allows running multiple times
                     to generate multiple samples of transmission; compile to 
                     run from the command line

    distribution.ml- contains implementations of fountain and goblet, with the 
                     rand_droplet_pieces and get_num_chunks methods overridden 
                     with new PRNG, which generate values according to either 
                     the poisson or normal distributions; the parameters to 
                     these distributions are given as class arguments; 
                     when droplets are generated, the number of pieces XOR'd 
                     into each piece varies based on the distribution and its 
                     parameters

    md5test.sh     - used for testing; compiles operations, creates a test file 
                     with 1000 bytes pulled from /dev/random, runs that file 
                     through operations, places the output in a file named 
                     test_dump, generates and compares MD5 hashes of the input 
                     and output files, and informs you of the result; test is 
                     passed if MD5 hashes match, failed if they do not match

                     NOTE: If you are in the appliance, you MUST change calls 
                     to md5 in this script to calls to md5sum. As it is, 
                     md5test.sh will run natively on Mac OS X, but not in the 
                     appliance. After making this change, you can ignore 
                     warnings about the use of the quiet flag (-q) when you run
                     the script.

    image_test.sh  - same as md5test.sh, but uses the included lenna.jpg image
                     instead of a randomly generated file; the output file 
                     (processed image) can be viewed by opening test_dump
                     
                     NOTE: If you are in the appliance, you MUST change calls 
                     to md5 in this script to calls to md5sum. As it is, 
                     image_test.sh will run natively on Mac OS X, but not in 
                     the appliance. After making this change, you can ignore 
                     warnings about the use of the quiet flag (-q) when you run
                     the script.

    test_dump,
    test_file      - files created, used, and/or overwritten by md5test.sh and 
                     image_test.sh during testing; test_dump can (and should) 
                     also be used as a convenient output destination when 
                     running operations
    
    lenna.jpg      - image for testing file I/O

    bear.gif       - a gif used for testing file I/O on larger, animated files

    sentence.txt   - A text file which contains the sentence "This is a 
                     test." Provided for convenience when testing operations.

    prince_ch1.txt - A text file which contains Chapter 1 of The Prince.
                     Provided for convenience when testing operations.


OPERATING INSTRUCTIONS
To encode a file, transmit its droplets to a goblet, and decode the file, 
first compile operations:

$ make operations

Then, from the command line:

$ ./operations filename output_destination piece_size max_pieces

For example, to run the sentence contained in sentence.txt through our LT code
(i.e., encode, transmit droplets to a Goblet, and then decode), you could type:

$ ./operations sentence.txt test_dump 5 4

This will cut up the data contained in sentence.txt into 5 character pieces and
allow at most 4 pieces to be XORed together in a droplet. The output (the 
reconstructed file) will be in test_dump.

We have included 2 scripts used for testing which also serve to 
demonstrate out project in action: md5test.sh and image_test.sh. Both are 
described above in the File Manifest. These scripts are run from the command 
line. To run md5test.sh, type:

$ ./md5test.sh

To run image_test.sh, type:

$ ./image_test.sh



To run probability, type:

$ ./probability filename

The executable will then prompt you for additional arguments, including
a choice of distribution, the parameters relevant for that distribution, and the
number of times to attempt the file decoding. The probability executable does
not output the file anywhere; it simply generates a new fountain and goblet
a certain number of times, and then prints out the number of droplets that
were consumed before the file could be decoded, along with the number of pieces
the original file was made of.


KNOWN BUGS/INEFFICIENCIES

The math currently being used in Goblet is sub-optimal and stands to be improved
by allowing the program the recognize when it can XOR immediately, instead of
waiting for a droplet containing a single piece of data. We tried but were
ultimately unable to implement this feature using matrices and Gaussian
elimination.

RESOLVED - Operations is known to overflow when used on relatively large images.
This is likely the result of failing to use a recursive tail call in Goblet.

RESOLVED - io_operations corrupts input data

RESOLVED - Duplicate pieces may appear in a single droplet. These duplicate do 
not provide any additional information.

RESOLVED - io_operations compiles but overflows on execution

RESOLVED - When you designated a piece size larger than the input message, 
spaces will be added to your message until it divides evenly by the piece size 
you have given. This also happens any time the message does not divide cleanly 
by the piece size given. This is expected behavior, but can be improved.