Skip to content

Commit

Permalink
Enhance Writer
Browse files Browse the repository at this point in the history
- write documentation
- split SizedWriter in its own module
- improve SizedWriter implementation (no allocation)
  • Loading branch information
hsyl20 committed Aug 30, 2024
1 parent 49cd4eb commit a493e6e
Show file tree
Hide file tree
Showing 8 changed files with 503 additions and 217 deletions.
1 change: 1 addition & 0 deletions doc/manual/src/stdlib.typ
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ some common data structures.
#include "stdlib/variant.typ"
#include "stdlib/eadt.typ"
#include "stdlib/binary.typ"
#include "stdlib/writer.typ"
155 changes: 155 additions & 0 deletions doc/manual/src/stdlib/writer.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
== Writing data in memory

There are several strategies to write data into memory:

*Strategy 1: assume there is enough space and just write!*

The first strategy is to assume that there is enough memory space to write
the data we want. This happens for example:
- when writing encoded x86 instructions: we know that they have a maximum length of 15 bytes
- when writing an image into a buffer: we know our image dimensions so we
can ensure that our buffers are large enough.

This is the fastest strategy because there isn't any boundary check occurring at
writing time. This strategy is implemented in `Haskus.Memory.Writer` (see
@fig-writer-def) by a function taking an address where to write and returning
the address after the bytes freshly written. The value to write is captured by
the function and not materialized in the types.

#figure(
caption: "Haskus.Memory.Writer",
[
```haskell
newtype Writer s
= Writer (Addr# -> State# s -> (# State# s, Addr# #))

instance Semigroup (Writer s) where
Writer f <> Writer g = Writer \addr0 s0 ->
let !(# s1, addr1 #) = f addr0 s0
in g addr1 s1

instance Monoid (Writer s) where
mempty = Writer \addr s -> (# s, addr #)
```]
) <fig-writer-def>

`Writer` values can be composed easily and efficiently: see `Semigroup` and
`Monoid` instances in @fig-writer-def.

The `Haskus.Memory.Writer` module contains many `write*` functions to write
basic types using this method. Consider the example in @fig-writer-example which
uses some of them to write 3 values: "8" with 1 byte, "16" with 2 bytes, and 32
with 4 bytes. Compiling with optimization leads to the x86-64 machine code in
@fig-writer-example-asm which is efficient (no heap allocations, no jumps,
etc.).

#columns(2, [

#figure(
caption: "Writer example",
```haskell
foo :: Writer s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
```
) <fig-writer-example>

#colbreak()

#figure(
caption: [x86 assembly generated from @fig-writer-example],
```asm
.globl foo1_info
.type foo1_info, @function
foo1_info:
movb $8,(%r14)
leaq 1(%r14),%rax
movw $16,(%rax)
addq $2,%rax
movl $32,(%rax)
leaq 4(%rax),%rbx
jmp *(%rbp)
```
) <fig-writer-example-asm>
]
)


*Strategy 2: provide a way to check if there is enough space, then just write!*

If we know beforehand how much memory a writer requires, we can act if there
isn't enough space (e.g. allocating more memory, flushing a buffer to disk to
make some space, throwing an exception...).

This is implemented in `Haskus.Memory.Writer.SizedWriter`. A `SizedWriter`
carries a `U#` value (unsigned machine word) indicating the number of bytes that
would be written by the writer. The implementation of `SizedWriter` is isomorphic
to the one in @fig-sizedwriter-def-possible, but the one we use is in
@fig-sizedwriter-def. The latter is better because it help ensuring that no
`SizedWriter` value is ever allocated.

#figure(
caption: "SizedWriter possible implementation",
```haskell
data SizedWriter s = SizedWriter
{ sizedWriterSize :: !U#
-- ^ The number of bytes that will be written by the writer
, sizedWriter :: !(W.Writer s)
-- ^ The Writer associated with this SizedWriter
}
```
) <fig-sizedwriter-def-possible>

#figure(
caption: "SizedWriter real implementation",
```haskell
newtype SizedWriter s
= SizedWriter' ( (##) -> (# U#, W.Writer s #) )
```
) <fig-sizedwriter-def>

If we rewrite our previous example (@fig-writer-example) to use `SizedWriter`
instead of `Writer` (@fig-sizedwriter-example), we obtain exactly the same
x86-64 machine code for the writer part and the whole number of bytes to write
is statically computed (@fig-sizedwriter-example-asm).

#figure(
caption: "SizedWriter example",
```haskell
foo :: SizedWriter s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
```
) <fig-sizedwriter-example>

#figure(
caption: [x86 assembly generated from @fig-sizedwriter-example],
```asm
.globl SizedWriter.foo1_info
.type SizedWriter.foo1_info, @function
SizedWriter.foo1_info:
leaq SizedWriter.foo_w_closure+2(%rip),%r14 ; address of the writer closure
movl $7,%ebx ; number of written bytes (7)
jmp *(%rbp)
```
) <fig-sizedwriter-example-asm>

*Other strategies*

+ check available space, refuse to write if there isn't enough space
- easy to implement continuation (call the same function again)
+ check available space, write as much as possible, return continuation
- need to allocate a continuation, but do as much work as possible every time

*Cost of determining the required size*

Determining the number of bytes to write may be costly: e.g. to write elements
of a lazy list. In this case, just counting the bytes could force the elements
of the lazy list, blowing up memory. A better strategy in this case could be to
interleave byte counting with actually writing the bytes.
10 changes: 10 additions & 0 deletions examples/SizedWriter.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module SizedWriter where

import Haskus.Memory.Writer.SizedWriter

foo :: SizedWriter s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
10 changes: 10 additions & 0 deletions examples/Writer.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module Writer where

import Haskus.Memory.Writer

foo :: Writer s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
2 changes: 2 additions & 0 deletions haskus-base/haskus-base.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ library
Haskus.Memory.Property
Haskus.Memory.Typed
Haskus.Memory.Writer
Haskus.Memory.Writer.SizedWriter

Haskus.Utils.Types
Haskus.Utils.Types.Bool
Expand Down Expand Up @@ -180,6 +181,7 @@ library
UnboxedSums
BangPatterns
PatternSynonyms
ViewPatterns
hs-source-dirs: lib

test-suite tests
Expand Down
Loading

0 comments on commit a493e6e

Please sign in to comment.