Smaller encoding of the huffman tree

Hey Prime! Your Huffman encoding video got me inspired.

From what I understand the tree is currently encoded via a table. Each node being 6 bytes:`Value, LeftIndex, RightIndex` you could reduce this quite dramatically by always building in the traditional left -> right depth first manner

**Encoding**

Each node is encoded with a leading bit which indicates if it's a leaf or a branch:

Leaf Bit 0
Branch Bit 1

Then each leaf has whatever data it needs encoded afterwards (in your case 16 bits)
Each branch will then encode it's left and right branch.

**Decoding**

Read the next bit
If it's 0 read the next 16 bits and construct the leaf
If it's 1 (recurse to build left, recurse to build right)

**Some maths**

For a huffman tree of N nodes this representation would need `ceil((N + N*ValueBitLength) / 8)`bytes to represent the tree in this format.

So the example from your unit tests would drop from 42 bytes down to 15.

Tree
```
   ()
  /  \
 A   ()
     / \
    B  ()
       / \
      C   D
```

Is encoded as `10A10B10C0D` (mixing binary and character values)

You could even do variable length encoding with this approach by having a header which dictates how many bytes each node has

**Example of it working**

Example in scala (because I haven't learned go yet) https://scastie.scala-lang.org/B3MontqASvuSNT3wNN4qxQ assuming a single byte for the values.

Hope this is helpful or at least interesting.

Peace out!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smaller encoding of the huffman tree #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development