forked from rlucioni/Fountain-Codes
-
Notifications
You must be signed in to change notification settings - Fork 0
vipshek/Fountain-Codes
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
README
CS 51 FINAL PROJECT: FOUNTAIN CODES
This is an OCaml implementation of a Luby transform (LT) code, the first class
of practical fountain codes that are near optimal erasure correcting codes.
The LT code was invented by Michael Luby in 1998 and published in 2002.
The distinguishing characteristic of LT codes is in employing an encoding
algorithm based on the XOR (exclusive or) operation to encode and decode a
message. LT codes are called "rateless" because the encoding algorithm can in
principle produce an infinite number of message packets, which for the purposes
of this project have been named "droplets."
********************************************************************************
FILE MANIFEST
droplet.ml - defines the Droplet, containing a seed, the XOR'd data,
and the total number of pieces
fountain.ml - defines the Fountain, produces Droplets according to the
fountain code implementation chosen
goblet.ml - defines a Goblet, used to collect Droplets and reconstruct
the original data. This takes a droplet (seed, the data to
be decoded, and the total number of pieces)
operations.ml - allows encoding and decoding of an arbitrary file
(message); the file is passed as an argument along with an
output destination (like test_dump), a piece size, and a
maximum number of pieces; compile to run from the command
line
probability.ml - takes a filename as a command line argument, then
initializes and uses a fountain and goblet with uniform,
poisson, or normal distribution (see distribution.ml below)
then takes additional arguments based on the probability
distribution selected; also allows running multiple times
to generate multiple samples of transmission; compile to
run from the command line
distribution.ml- contains implementations of fountain and goblet, with the
rand_droplet_pieces and get_num_chunks methods overridden
with new PRNG, which generate values according to either
the poisson or normal distributions; the parameters to
these distributions are given as class arguments;
when droplets are generated, the number of pieces XOR'd
into each piece varies based on the distribution and its
parameters
md5test.sh - used for testing; compiles operations, creates a test file
with 1000 bytes pulled from /dev/random, runs that file
through operations, places the output in a file named
test_dump, generates and compares MD5 hashes of the input
and output files, and informs you of the result; test is
passed if MD5 hashes match, failed if they do not match
NOTE: If you are in the appliance, you MUST change calls
to md5 in this script to calls to md5sum. As it is,
md5test.sh will run natively on Mac OS X, but not in the
appliance. After making this change, you can ignore
warnings about the use of the quiet flag (-q) when you run
the script.
image_test.sh - same as md5test.sh, but uses the included lenna.jpg image
instead of a randomly generated file; the output file
(processed image) can be viewed by opening test_dump
NOTE: If you are in the appliance, you MUST change calls
to md5 in this script to calls to md5sum. As it is,
image_test.sh will run natively on Mac OS X, but not in
the appliance. After making this change, you can ignore
warnings about the use of the quiet flag (-q) when you run
the script.
test_dump,
test_file - files created, used, and/or overwritten by md5test.sh and
image_test.sh during testing; test_dump can (and should)
also be used as a convenient output destination when
running operations
lenna.jpg - image for testing file I/O
bear.gif - a gif used for testing file I/O on larger, animated files
sentence.txt - A text file which contains the sentence "This is a
test." Provided for convenience when testing operations.
prince_ch1.txt - A text file which contains Chapter 1 of The Prince.
Provided for convenience when testing operations.
OPERATING INSTRUCTIONS
To encode a file, transmit its droplets to a goblet, and decode the file,
first compile operations:
$ make operations
Then, from the command line:
$ ./operations filename output_destination piece_size max_pieces
For example, to run the sentence contained in sentence.txt through our LT code
(i.e., encode, transmit droplets to a Goblet, and then decode), you could type:
$ ./operations sentence.txt test_dump 5 4
This will cut up the data contained in sentence.txt into 5 character pieces and
allow at most 4 pieces to be XORed together in a droplet. The output (the
reconstructed file) will be in test_dump.
We have included 2 scripts used for testing which also serve to
demonstrate out project in action: md5test.sh and image_test.sh. Both are
described above in the File Manifest. These scripts are run from the command
line. To run md5test.sh, type:
$ ./md5test.sh
To run image_test.sh, type:
$ ./image_test.sh
To run probability, type:
$ ./probability filename
The executable will then prompt you for additional arguments, including
a choice of distribution, the parameters relevant for that distribution, and the
number of times to attempt the file decoding. The probability executable does
not output the file anywhere; it simply generates a new fountain and goblet
a certain number of times, and then prints out the number of droplets that
were consumed before the file could be decoded, along with the number of pieces
the original file was made of.
KNOWN BUGS/INEFFICIENCIES
The math currently being used in Goblet is sub-optimal and stands to be improved
by allowing the program the recognize when it can XOR immediately, instead of
waiting for a droplet containing a single piece of data. We tried but were
ultimately unable to implement this feature using matrices and Gaussian
elimination.
RESOLVED - Operations is known to overflow when used on relatively large images.
This is likely the result of failing to use a recursive tail call in Goblet.
RESOLVED - io_operations corrupts input data
RESOLVED - Duplicate pieces may appear in a single droplet. These duplicate do
not provide any additional information.
RESOLVED - io_operations compiles but overflows on execution
RESOLVED - When you designated a piece size larger than the input message,
spaces will be added to your message until it divides evenly by the piece size
you have given. This also happens any time the message does not divide cleanly
by the piece size given. This is expected behavior, but can be improved.
About
An implementation of a Luby transform (LT) code written in OCaml
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published