Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Writer #69

Merged
merged 1 commit into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/manual/src/stdlib.typ
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ some common data structures.
#include "stdlib/variant.typ"
#include "stdlib/eadt.typ"
#include "stdlib/binary.typ"
#include "stdlib/writer.typ"
155 changes: 155 additions & 0 deletions doc/manual/src/stdlib/writer.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
== Writing data in memory

There are several strategies to write data into memory:

*Strategy 1: assume there is enough space and just write!*

The first strategy is to assume that there is enough memory space to write
the data we want. This happens for example:
- when writing encoded x86 instructions: we know that they have a maximum length of 15 bytes
- when writing an image into a buffer: we know our image dimensions so we
can ensure that our buffers are large enough.

This is the fastest strategy because there isn't any boundary check occurring at
writing time. This strategy is implemented in `Haskus.Memory.Writer` (see
@fig-writer-def) by a function taking an address where to write and returning
the address after the bytes freshly written. The value to write is captured by
the function and not materialized in the types.

#figure(
caption: "Haskus.Memory.Writer",
[
```haskell
newtype Writer s
= Writer (Addr# -> State# s -> (# State# s, Addr# #))

instance Semigroup (Writer s) where
Writer f <> Writer g = Writer \addr0 s0 ->
let !(# s1, addr1 #) = f addr0 s0
in g addr1 s1

instance Monoid (Writer s) where
mempty = Writer \addr s -> (# s, addr #)
```]
) <fig-writer-def>

`Writer` values can be composed easily and efficiently: see `Semigroup` and
`Monoid` instances in @fig-writer-def.

The `Haskus.Memory.Writer` module contains many `write*` functions to write
basic types using this method. Consider the example in @fig-writer-example which
uses some of them to write 3 values: "8" with 1 byte, "16" with 2 bytes, and 32
with 4 bytes. Compiling with optimization leads to the x86-64 machine code in
@fig-writer-example-asm which is efficient (no heap allocations, no jumps,
etc.).

#columns(2, [

#figure(
caption: "Writer example",
```haskell
foo :: Writer s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
```
) <fig-writer-example>

#colbreak()

#figure(
caption: [x86 assembly generated from @fig-writer-example],
```asm
.globl foo1_info
.type foo1_info, @function
foo1_info:
movb $8,(%r14)
leaq 1(%r14),%rax
movw $16,(%rax)
addq $2,%rax
movl $32,(%rax)
leaq 4(%rax),%rbx
jmp *(%rbp)
```
) <fig-writer-example-asm>
]
)


*Strategy 2: provide a way to check if there is enough space, then just write!*

If we know beforehand how much memory a writer requires, we can act if there
isn't enough space (e.g. allocating more memory, flushing a buffer to disk to
make some space, throwing an exception...).

This is implemented in `Haskus.Memory.Writer.SizedWriter`. A `SizedWriter`
carries a `U#` value (unsigned machine word) indicating the number of bytes that
would be written by the writer. The implementation of `SizedWriter` is isomorphic
to the one in @fig-sizedwriter-def-possible, but the one we use is in
@fig-sizedwriter-def. The latter is better because it help ensuring that no
`SizedWriter` value is ever allocated.

#figure(
caption: "SizedWriter possible implementation",
```haskell
data SizedWriter s = SizedWriter
{ sizedWriterSize :: !U#
-- ^ The number of bytes that will be written by the writer
, sizedWriter :: !(W.Writer s)
-- ^ The Writer associated with this SizedWriter
}
```
) <fig-sizedwriter-def-possible>

#figure(
caption: "SizedWriter real implementation",
```haskell
newtype SizedWriter s
= SizedWriter' ( (##) -> (# U#, W.Writer s #) )
```
) <fig-sizedwriter-def>

If we rewrite our previous example (@fig-writer-example) to use `SizedWriter`
instead of `Writer` (@fig-sizedwriter-example), we obtain exactly the same
x86-64 machine code for the writer part and the whole number of bytes to write
is statically computed (@fig-sizedwriter-example-asm).

#figure(
caption: "SizedWriter example",
```haskell
foo :: SizedWriter s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
```
) <fig-sizedwriter-example>

#figure(
caption: [x86 assembly generated from @fig-sizedwriter-example],
```asm
.globl SizedWriter.foo1_info
.type SizedWriter.foo1_info, @function
SizedWriter.foo1_info:
leaq SizedWriter.foo_w_closure+2(%rip),%r14 ; address of the writer closure
movl $7,%ebx ; number of written bytes (7)
jmp *(%rbp)
```
) <fig-sizedwriter-example-asm>

*Other strategies*

+ check available space, refuse to write if there isn't enough space
- easy to implement continuation (call the same function again)
+ check available space, write as much as possible, return continuation
- need to allocate a continuation, but do as much work as possible every time

*Cost of determining the required size*

Determining the number of bytes to write may be costly: e.g. to write elements
of a lazy list. In this case, just counting the bytes could force the elements
of the lazy list, blowing up memory. A better strategy in this case could be to
interleave byte counting with actually writing the bytes.
10 changes: 10 additions & 0 deletions examples/SizedWriter.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module SizedWriter where

import Haskus.Memory.Writer.SizedWriter

foo :: SizedWriter s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
10 changes: 10 additions & 0 deletions examples/Writer.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module Writer where

import Haskus.Memory.Writer

foo :: Writer s
foo = mconcat
[ writeU8 8
, writeU16 16
, writeU32 32
]
2 changes: 2 additions & 0 deletions haskus-base/haskus-base.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ library
Haskus.Memory.Property
Haskus.Memory.Typed
Haskus.Memory.Writer
Haskus.Memory.Writer.SizedWriter

Haskus.Utils.Types
Haskus.Utils.Types.Bool
Expand Down Expand Up @@ -180,6 +181,7 @@ library
UnboxedSums
BangPatterns
PatternSynonyms
ViewPatterns
hs-source-dirs: lib

test-suite tests
Expand Down
Loading
Loading