Generalise Input Type #52

david-davies · 2024-11-15T20:42:11Z

The Problem

Other parser combinator libraries, such as megaparsec and attoparsec allow for types other than String to be used as input.

These libraries achieve this by parametrising the parser type by its input type, type Parsec input a. This has the disadvantage of generating noise in the API, as pretty much everything now needs to be parametrised by the input type.

The Solution

@j-mie6 proposed instead we keep the input type as an existential within the internal State record.
Then, the input type is decided at the top-level parse function, but is then hidden away from the rest of the API; after all, most of the library does not actually need to know what the input type was. Thus, we get the benefit of a simpler API (the parser is only parametrised by the value it returns, type Parsec a), which is still capable of handling multiple input types.

This PR implements the changes needed to achieve this. Currently, all these changes are internal, and there is not yet any change to the user API which would allow them to take advantage of these changes. Nonetheless, this is a good first step.

Some Details on Implementation

The first attempt (see c8fe550) had an input type,

data Input = ∀ s . InputOps s => Input s

and then State had the field input :: Input.

This was nice as the state did not need to know anything about the input type. However, this had two main problems:

Using Haskell's constraints with InputOps means we did not have much control over specialisation and whatnot.
The primitive and critical combinator satisfy became quite inefficient, as parsing each character involved: pattern matching the input, getting its head and tail, and then repackaging the tail under another Input.

To solve (1), we instead use a record InputOps, which can be thought of as a 'manual' constraint. This gave direct access to the input manipulation operations, rather than relying on constraints and dictionaries, etc.

To solve (2), the existential s was moved to be part of the State data type;

data State = ∀ s . { input :: !s, inputOps :: !(InputOps s), ...}

This allows the State (and, indirectly, parser) to discover input type, which means it can directly work with the input without unwrapping and rewrapping with an Input constructor.

One downside to this approach is we cannot use input and inputOps as record projections, as this would allow the existential s to escape. Instead, they must be accessed via pattern matching, usually something like \st@State {input, inputOps} -> ... in the CPS style used internally.

… of converting to String for the errors

…which was wrong.

…ng Data.List.uncons

…ng Data.List.null

… input; fix incorrect definition of eof

…nt satisfy_ defn

… why this disappeared

david-davies added 10 commits November 13, 2024 11:02

Internal: still mucking about with inputstreams

c28fd22

add(Internal): first draft of general Input, currently involves a lot…

1f65b57

… of converting to String for the errors

fix(generalise-input): uncons Input stype now returns , instead of , …

e3449bd

…which was wrong.

fix(Internal.Input): simplify defn of InputStream String instance usi…

6f8c6ac

…ng Data.List.uncons

fix(Internal.Input): simplify defn of InputStream String instance usi…

f5b3d50

…ng Data.List.null

fix(Internal.Input, Tests): fix test-suite to account for generalised…

7d7455c

… input; fix incorrect definition of eof

change(Input): simplify defn of unconsInput

c8fe550

add(Internal): move input param into state, allowing for more efficie…

23acebf

…nt satisfy_ defn

refactor(Internal): InputStream renamed to InputOps

ff8fcde

docs(Internal): add comments for things related to Input

8bbb3e2

david-davies added the enhancement New feature or request label Nov 15, 2024

david-davies added 2 commits November 15, 2024 21:00

clean(Char): remove unsused unsafeCoerceUnlifted import

dc0efb5

fix(Internal): re-include CPP_import_PortableUnlifted macro; not sure…

08b9318

… why this disappeared

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalise Input Type #52

Generalise Input Type #52

david-davies commented Nov 15, 2024 •

edited

Loading

Generalise Input Type #52

Are you sure you want to change the base?

Generalise Input Type #52

Conversation

david-davies commented Nov 15, 2024 • edited Loading

The Problem

The Solution

Some Details on Implementation

david-davies commented Nov 15, 2024 •

edited

Loading