Skip to content
This repository was archived by the owner on Jul 2, 2024. It is now read-only.

An experiment with parser combinators for ArkScript

SuperFola/parser-combinators

Repository files navigation

Parser combinators experiment

An experiment with parser combinators, to replace the current lexer/parser of ArkScript.

This project has multiple goals:

  • making a more extensible parser than the current one Ark has;
  • removing weird edge cases the current parser has ;
  • reducing the number of bugs the parser has ;
  • easier generation of error contexts

Building

You need CMake >= 3.24 and a C++17 capable compiler (eg Clang 14).

cmake -Bbuild -DCMAKE_BUILD_TYPE=Debug
cmake --build build

build/parser <filename>

Current state

Subparsers:

  • let, mut, set
    • handle nodes as values
  • del
  • condition
    • handle nodes as condition
    • handle nodes as values
  • loop
    • handle nodes as conditions
    • handle nodes as body
  • import
  • begin block
  • function
    • handle nodes as body
  • macro
    • handle nodes as body
  • atom
    • number
      • floating point 1.2
      • scientific numbers 12e+14, 4.5e+16
    • string
      • handle \uxxxxx, \Uxxxxx, \xabc in strings
      • handle other espace sequences: n, r, t, a, b, f, 0, , "
    • boolean
    • nil
    • symbol
  • comment
    • comments in blocks and not only top level ones
  • function calls
    • anonymous calls: ((fun () (print 1)))
  • identifiers
    • symbol
    • capture
    • dot notation
      • dot notation after call: (@ list 14).field
    • non alnum identifiers (+, !=, >=...)
  • special syntax for (list ...): [...]

Error context generation:

  • better messages
    • what went wrong at the syntax level
    • what went wrong at the language level
    • possible fix
  • sometimes the wrong token is underlined Example:
ERROR
Package name expected after '.'
At ' ' @ 1:12
    1 | (import a. )
      |           ^

Misc:

  • handle UTF-8
    • store codepoints in struct { unsigned int cp; std::string repr; };
    • homemade std::is"char category"(codepoint)
    • decode UTF8 to calculate correctly the columns

Breaking changes

This is for ArkScript, but some things had to change for the next version of the language, implemented by this parser.

  • quote is no longer supported, use functions with no arguments instead
  • import do not work the same way as before: (import "path.ark") won't work, we are using a package like syntax now:
(import a)
(import a.b)  # everything is prefixed by b
(import foo.bar.egg)
(import foo:*)  # everything is imported in the current scope
(import foo.bar :a :b)  # we import only a and b from foo.bar, in the current scope
  • fields aren't chained in the AST: (Symbol:a GetField:b GetField:c) was the old way of having a a.b.c in the AST, now we have Field(Symbol:a Symbol:b Symbol:c), the node holding the field being a list of symbols

About

An experiment with parser combinators for ArkScript

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published