forked from SamirPaulb/txt-compressor
-
Notifications
You must be signed in to change notification settings - Fork 0
/
info.html
151 lines (133 loc) · 12 KB
/
info.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
<!-- Author : Samir Paul -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge" >
<meta name="viewport" content="width=device-width, initial-scale=1.0" >
<!-- Primary Meta Tags -->
<title>Huffman Compression Technique</title>
<link rel="shortcut icon" href="assets/favicon.ico">
<meta name="title" content="Huffman Compression Technique">
<meta name="description" content="An online .txt file compressor, de-compressor tool which uses Huffman Coding for Lossless data compression.">
<meta name="keywords" content="Huffman Compression Technique, encoding,encoder,huffman, huffman-coding, lossless, huffman-compression-algorithm, txt, lossless-compression-algorithm, file-compression, huffman-encoder, huffman-decoder, huffman-encoding, txt-encode, txt-decode, lossless-compression, lossless-data-compression, filecompressor, online-file-compressor, txt-compressor, GZIP, Brotli, WebP, png, online file compression, file compressor, txt compression, txt compressor, txt encoding, online txt compressor, txt decompressor, samirpaul, SamirPaul1, SamirPaulb, Samir Paul, txt compressor github, gitpages, website, txt compressor website">
<meta name="robots" content="index, follow">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="language" content="English">
<meta name="author" content="Samir Paul">
<!-- Open Graph / Facebook -->
<meta property="og:type" content="website">
<meta property="og:url" content="https://samirpaulb.github.io/txt-compressor/info.html">
<meta property="og:title" content="Huffman Compression Technique">
<meta property="og:description" content="An online .txt file compressor, de-compressor tool which uses Huffman Coding for Lossless data compression.">
<meta property="og:image" content="https://raw.githubusercontent.com/SamirPaulb/assets/main/huffman-algorithm_python-txt-compressor-webimage-samirpaul.jpg">
<!-- Twitter -->
<meta property="twitter:card" content="summary_large_image">
<meta property="twitter:url" content="https://samirpaulb.github.io/txt-compressor/info.html">
<meta property="twitter:title" content="Huffman Compression Technique">
<meta property="twitter:description" content="An online .txt file compressor, de-compressor tool which uses Huffman Coding for Lossless data compression.">
<meta property="twitter:image" content="https://raw.githubusercontent.com/SamirPaulb/assets/main/huffman-algorithm_python-txt-compressor-webimage-samirpaul.jpg">
<link rel="canonical" href="https://samirpaul.in/txt-compressor/info.html"/>
<link type="text/css" rel="stylesheet" href="styles.css">
<style>
body { background-color: #fffde0; }
</style>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-CP4QE6ZEV0"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-CP4QE6ZEV0');
</script>
</head>
<body>
<a href="index.html" class="links"> <- Go Back </a>
<center><h1><p style="color:red"><a href="index.html" style="color:red">This website</a> uses Huffman Coding to compress data!</p> </h1></center>
<ul>
<li><b>Huffman Coding</b> is a popular technique used for <b>Lossless Data Compression</b>.</li>
<h2 style="color:blue">Lossles Compression:</h2>
<ol><b>Lossless compression</b> is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. Lossless compression methods are <b>reversible</b>. Examples of lossless compression include <b>GZIP</b>, <b>Brotli</b>, <b>WebP</b>, and <b>PNG</b>.</ol>
<ol>Lossles Compression is preffered for text file compression, while Lossy Compression is generally preferred
for audio, video and image files.</ol>
<ol>File formats like <b>ZIP</b> use Lossless compression, while formats like <b>MP3</b> and <b>JPEG</b> use
Lossy compression</ol>
<ol><b>Data compression ratio</b> , also known as compression power, is a measurement of the relative reduction
in size of data representation produced by a data compression algorithm. It is typically expressed as the
division of uncompressed size by compressed size. Thus, a representation that compresses a file's storage
size from 10 MB to 2 MB has a compression ratio of 10/2 = 5
</ol>
<center><img src="images/info-images/lossless-lossy-data-compression.png" alt="haffman-tree-and-binary-code-example.jpeg" loading="lazy" height="130" width="60%"></center>
<br>
<h2 style="color:blue">Huffman Code:</h2>
<ol>
A <b>Huffman code</b> is a particular type of optimal prefix code that is commonly used for lossless data
compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm
developed by David A. Huffman.
</ol>
<ol>
The output from Huffman's algorithm can be viewed as a <b>variable-length code table</b>
for encoding a source symbol (such as a character in a file). The algorithm derives this table from the
estimated probability or frequency of occurrence (weight) for each possible value of the source symbol.
</ol>
<ol><b>Prefix Codes</b>, means the codes (bit sequences) are assigned in such a way that the code assigned to
one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes
sure that there is <b>no ambiguity</b> when decoding the generated bitstream.
</ol>
<center><img src="images/info-images/huffman-coding.png" alt="haffman-tree-and-binary-code-example.jpeg" loading="lazy" height="180" width="60%"></center>
<br>
<center><p>A Huffman coding tree built from this character frequency table: A=0.6, B=0.2, C=0.07, D=0.06, E=0.05, F=0.02.</p></center>
<center><img src="images/info-images/data-compression-with-huffmans-algorithm.png" loading="lazy" alt="data-compression-with-huffmans-algorithm.png" height="200" width="50%"></center>
<br>
<h2 style="color:blue">Compression:</h2>
<li>Huffman Encoding is an algorithm which uses frequency (or probability) feature of symbols and a binary tree structure. It consists of the following 3 steps:
<ol>
<li> Probability Calculation & Ordering the Symbols</li>
<li>Binary Tree Transformation</li>
<li>Assigning Codes to the Symbols</li>
</ol>
</li>
<li>
We count the number of each symbol in the whole data, then we calculate the “probability” of each symbol by dividing that count by the total number of characters in the data. Since its an algorithm using probability, more common symbols — the symbols having higher probability — are generally represented using fewer bits than less common symbols. This is one of the advantageous sides of Huffman Encoding. </li>
<li>As an example, for the following data having 5 different symbols as A B C D E, we have the probabilities as shown right:</li>
<br>
<center><img src="images/info-images/compression1.png" alt="compression1" loading="lazy" height="150" width="70%"></center>
<li>Then we easily order the symbols according to their probabilities representing each symbol as a node and call that our “collection”. Now, we are ready to pass the next step.</li>
<center><img src="images/info-images/compression2.png" alt="compression2" loading="lazy" height="150" width="70%"></center>
<li>Binary Tree Transformation:</li>
<ol>1. From the collection, we pick out the two nodes with the smallest sum of probabilities and combine them into a new tree whose root has the probability equal to that sum.</ol>
<ol>2. We add the new tree back into the collection.</ol>
<ol>3. We repeat this process until one tree encompassing all the input probabilities has been constructed.<ol>
<center><img src="images/info-images/compression3.png" alt="compression3" loading="lazy" height="250" width="70%"></center>
<center><img src="images/info-images/compression4.png" alt="compression4" loading="lazy" height="250" width="70%"></center>
<center><img src="images/info-images/compression5.png" alt="compression5" loading="lazy" height="200" width="70%"></center>
<center><b>Implementation with Python:</b></center>
<center><img src="images/info-images/compression-code.png" alt="python implementation code" loading="lazy" height="100%" width="90%"></center>
<center><b>output:</b></center>
<center><img src="images/info-images/compression-code-output.png" alt="python implementation code" loading="lazy" height="100%" width="70%"></center>
<br>
<ol>As a conclusion, we see that the compression ratio does not change with the growing amount of data, and this ratio is close to 2:1. We can say that Huffman Encoding is an algorithm that compresses the data to its half size. Although it is old, it is still an effective compression algorithm! </ol>
<h2 style="color:blue">De-compression:</h2>
<li>Having our Binary Huffman Tree obtained during encode phase, decoding is a very simple process to perform.</li>
<li>Let’s consider we have the same example with Huffman Encoding post, therefore we have AAAAAAABCCCCCCDDEEEEE as our initial data and 000000000000001110101010101011101101010101010 as encoded output with the following Huffman Tree:</li>
<br>
<center><img src="images/info-images/huffman-tree-for-decoding.png" alt="huffman-tree-for-decoding.png" loading="lazy" height="250" width="60%"></center>
<br>
<li>Now the only thing we should do is starting from the head of Huffman Tree and from the beginning of the encoded data, each time we encounter 1 we go right and while we encounter 0, we go left through the Huffman Tree. When we reach at a leaf node, we obtain the symbol! Then we just start again from the head of Huffman Tree while moving forward on encoded data.</li>
<li> With a few lines added in huffman.py coming from Huffman Encoding & Python Implementation, we can easily implement Huffman_Decoding and here is the result:</li>
<center><img src="images/info-images/Huffman_Decoding-carbon-python-code.png" alt="Huffman_Decoding-carbon-python-code" loading="lazy" height="380" width="80%"></center>
<center><img src="images/info-images/huffman_decoding-python-code-result.png" alt="huffman_decoding-python-code-result" loading="lazy" height="170" width="80%"></center>
<li>Data compression is a topic of many applications and it has various different types of algorithms beside of “frequency based” Huffman Algorithm. You can check for “dictionary based” methods like LZ77 LZ78 LZW which are useful for image compression especially.</li>
<br>
</ul>
<h2 style="color:blue">References for this article:</h2>
<ul>
<li><a href="https://engineering.purdue.edu/ece264/17au/hw/HW13?alt=huffman" target="_blank">engineering.purdue.edu</a></li>
<li><a href="https://developer.mozilla.org/en-US/docs/Glossary/Lossless_compression" target="_blank">developer.mozilla.org/Lossless_compression</a></li>
<li><a href="https://towardsdatascience.com/huffman-encoding-python-implementation-8448c3654328" target="_blank">towardsdatascience/huffman-encoding-python-implementation</a></li>
<li><a href="https://en.wikipedia.org/wiki/Huffman_coding" target="_blank">wikipedia/Huffman_coding</a></li>
<li><a href="https://towardsdatascience.com/huffman-decoding-cca770065bab" target="_blank">towardsdatascience/huffman-decoding</a></li>
<li><a href="https://web.stanford.edu/class/archive/cs/cs106b/cs106b.1176/assnFiles/assign6/huffman-encoding-supplement.pdf" target="_blank">web.stanford.edu/huffman-encoding-supplement</a></li>
</ul>
</body>
</html>