Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalise Input Type #52

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

david-davies
Copy link
Collaborator

@david-davies david-davies commented Nov 15, 2024

The Problem

Other parser combinator libraries, such as megaparsec and attoparsec allow for types other than String to be used as input.

These libraries achieve this by parametrising the parser type by its input type, type Parsec input a. This has the disadvantage of generating noise in the API, as pretty much everything now needs to be parametrised by the input type.

The Solution

@j-mie6 proposed instead we keep the input type as an existential within the internal State record.
Then, the input type is decided at the top-level parse function, but is then hidden away from the rest of the API; after all, most of the library does not actually need to know what the input type was. Thus, we get the benefit of a simpler API (the parser is only parametrised by the value it returns, type Parsec a), which is still capable of handling multiple input types.

This PR implements the changes needed to achieve this. Currently, all these changes are internal, and there is not yet any change to the user API which would allow them to take advantage of these changes. Nonetheless, this is a good first step.

Some Details on Implementation

The first attempt (see c8fe550) had an input type,

data Input = ∀ s . InputOps s => Input s

and then State had the field input :: Input.

This was nice as the state did not need to know anything about the input type. However, this had two main problems:

  1. Using Haskell's constraints with InputOps means we did not have much control over specialisation and whatnot.
  2. The primitive and critical combinator satisfy became quite inefficient, as parsing each character involved: pattern matching the input, getting its head and tail, and then repackaging the tail under another Input.

To solve (1), we instead use a record InputOps, which can be thought of as a 'manual' constraint. This gave direct access to the input manipulation operations, rather than relying on constraints and dictionaries, etc.

To solve (2), the existential s was moved to be part of the State data type;

data State = ∀ s . { input :: !s, inputOps :: !(InputOps s), ...}

This allows the State (and, indirectly, parser) to discover input type, which means it can directly work with the input without unwrapping and rewrapping with an Input constructor.

One downside to this approach is we cannot use input and inputOps as record projections, as this would allow the existential s to escape. Instead, they must be accessed via pattern matching, usually something like \st@State {input, inputOps} -> ... in the CPS style used internally.

@david-davies david-davies added the enhancement New feature or request label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant