-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalise Input Type #52
Draft
david-davies
wants to merge
12
commits into
j-mie6:main
Choose a base branch
from
david-davies:generalise-input
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… of converting to String for the errors
…ng Data.List.uncons
…ng Data.List.null
… input; fix incorrect definition of eof
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Problem
Other parser combinator libraries, such as
megaparsec
andattoparsec
allow for types other thanString
to be used as input.These libraries achieve this by parametrising the parser type by its input type,
type Parsec input a
. This has the disadvantage of generating noise in the API, as pretty much everything now needs to be parametrised by the input type.The Solution
@j-mie6 proposed instead we keep the input type as an existential within the internal
State
record.Then, the input type is decided at the top-level
parse
function, but is then hidden away from the rest of the API; after all, most of the library does not actually need to know what the input type was. Thus, we get the benefit of a simpler API (the parser is only parametrised by the value it returns,type Parsec a
), which is still capable of handling multiple input types.This PR implements the changes needed to achieve this. Currently, all these changes are internal, and there is not yet any change to the user API which would allow them to take advantage of these changes. Nonetheless, this is a good first step.
Some Details on Implementation
The first attempt (see c8fe550) had an input type,
and then
State
had the fieldinput :: Input
.This was nice as the state did not need to know anything about the input type. However, this had two main problems:
InputOps
means we did not have much control over specialisation and whatnot.satisfy
became quite inefficient, as parsing each character involved: pattern matching the input, getting its head and tail, and then repackaging the tail under anotherInput
.To solve (1), we instead use a record
InputOps
, which can be thought of as a 'manual' constraint. This gave direct access to the input manipulation operations, rather than relying on constraints and dictionaries, etc.To solve (2), the existential
s
was moved to be part of theState
data type;This allows the
State
(and, indirectly, parser) to discover input type, which means it can directly work with the input without unwrapping and rewrapping with anInput
constructor.One downside to this approach is we cannot use
input
andinputOps
as record projections, as this would allow the existentials
to escape. Instead, they must be accessed via pattern matching, usually something like\st@State {input, inputOps} -> ...
in the CPS style used internally.