Skip to content

New practice exercise sgf-parsing #795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -2566,8 +2566,8 @@
"list-comprehensions"
],
"difficulty": 8
},
{
},
{
"slug": "circular-buffer",
"name": "Circular Buffer",
"uuid": "535d64c9-95f7-4f59-b7b6-d459337b82b0",
Expand All @@ -2593,6 +2593,31 @@
"errors"
],
"difficulty": 8
},
{
"slug": "sgf-parsing",
"name": "SGF Parsing",
"uuid": "f87d00fa-99c4-4fda-9d90-d9c4d569a82e",
"prerequisites": [
"atoms",
"tuples",
"lists",
"strings",
"structs",
"enum",
"maps",
"case",
"cond",
"if",
"multiple-clause-functions",
"pattern-matching",
"guards",
"regular-expressions"
],
"practices": [
"regular-expressions"
],
"difficulty": 8
}
],
"foregone": [
Expand Down
66 changes: 66 additions & 0 deletions exercises/practice/sgf-parsing/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Description

Parsing a Smart Game Format string.

[SGF](https://en.wikipedia.org/wiki/Smart_Game_Format) is a standard format for
storing board game files, in particular go.

SGF is a fairly simple format. An SGF file usually contains a single
tree of nodes where each node is a property list. The property list
contains key value pairs, each key can only occur once but may have
multiple values.

An SGF file may look like this:

```text
(;FF[4]C[root]SZ[19];B[aa];W[ab])
```

This is a tree with three nodes:

- The top level node has three properties: FF\[4\] (key = "FF", value
= "4"), C\[root\](key = "C", value = "root") and SZ\[19\] (key =
"SZ", value = "19"). (FF indicates the version of SGF, C is a
comment and SZ is the size of the board.)
- The top level node has a single child which has a single property:
B\[aa\]. (Black plays on the point encoded as "aa", which is the
1-1 point).
- The B\[aa\] node has a single child which has a single property:
W\[ab\].

As you can imagine an SGF file contains a lot of nodes with a single
child, which is why there's a shorthand for it.

SGF can encode variations of play. Go players do a lot of backtracking
in their reviews (let's try this, doesn't work, let's try that) and SGF
supports variations of play sequences. For example:

```text
(;FF[4](;B[aa];W[ab])(;B[dd];W[ee]))
```

Here the root node has two variations. The first (which by convention
indicates what's actually played) is where black plays on 1-1. Black was
sent this file by his teacher who pointed out a more sensible play in
the second child of the root node: `B[dd]` (4-4 point, a very standard
opening to take the corner).

A key can have multiple values associated with it. For example:

```text
(;FF[4];AB[aa][ab][ba])
```

Here `AB` (add black) is used to add three black stones to the board.

There are a few more complexities to SGF (and parsing in general), which
you can mostly ignore. You should assume that the input is encoded in
UTF-8, the tests won't contain a charset property, so don't worry about
that. Furthermore you may assume that all newlines are unix style (`\n`,
no `\r` or `\r\n` will be in the tests) and that no optional whitespace
between properties, nodes, etc will be in the tests.

The exercise will have you parse an SGF string and return a tree
structure of properties. You do not need to encode knowledge about the
data types of properties, just use the rules for the
[text](http://www.red-bean.com/sgf/sgf4.html#text) type everywhere.
4 changes: 4 additions & 0 deletions exercises/practice/sgf-parsing/.formatter.exs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Used by "mix format"
[
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"]
]
20 changes: 20 additions & 0 deletions exercises/practice/sgf-parsing/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"authors": ["jiegillet"],
"contributors": [
"neenjaw",
"angelikatyborska"
],
"files": {
"example": [
".meta/example.ex"
],
"solution": [
"lib/sgf_parsing.ex"
],
"test": [
"test/sgf_parsing_test.exs"
]
},
"blurb": "Parsing a Smart Game Format string.",
"title": "SGF Parsing"
}
210 changes: 210 additions & 0 deletions exercises/practice/sgf-parsing/.meta/example.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
defmodule SgfParsing do
# Used to make recursive parsers lazy
defmacro lazy(parser) do
quote do
fn string -> unquote(parser).(string) end
end
end

defmodule Sgf do
defstruct properties: %{}, children: []
end

@type sgf :: %Sgf{properties: map, children: [sgf]}
@doc """
Parse a string into a Smart Game Format tree
"""
@spec parse(encoded :: String.t()) :: {:ok, sgf} | {:error, String.t()}
def parse(encoded) do
parser = parse_tree_paren() |> eof()

with {:ok, tree, ""} <- run_parser(parser, encoded) do
{:ok, tree}
else
{:error, err, _rest} -> {:error, err}
end
end

# TREE PARSER

def parse_tree() do
parse_properties =
char(?;)
|> error("tree with no nodes")
|> drop_and(many(parse_property()))
|> map(&Map.new/1)

parse_children =
one_of([
map(parse_tree(), &List.wrap/1),
many(parse_tree_paren())
])
|> lazy()

lift2(&%Sgf{properties: &1, children: &2}, parse_properties, parse_children)
end

def parse_tree_paren() do
char(?()
|> error("tree missing")
|> drop_and(parse_tree())
|> drop(char(?)))
end

def parse_property() do
parse_name =
some(satisfy(&(&1 not in '[();')))
|> map(&Enum.join(&1, ""))
|> validate(&(&1 == String.upcase(&1)), "property must be in uppercase")

parse_attributes =
some(
char(?[)
|> error("properties without delimiter")
|> drop_and(many(escaped(&(&1 != ?]))))
|> drop(char(?]))
|> map(&Enum.join(&1, ""))
)

lift2(&{&1, &2}, parse_name, parse_attributes)
end

def escaped(p) do
one_of([
lift2(&escape/2, char(?\\), satisfy(&(&1 in 'nt]['))),
satisfy(p)
])
end

def escape("\\", "n"), do: "\n"
def escape("\\", "t"), do: "\t"
def escape("\\", "]"), do: "]"
def escape("\\", "["), do: "["

# PARSER COMBINATORS LIBRARY
# Inspired from Haskell libraries like Parsec
# and https://serokell.io/blog/parser-combinators-in-elixir

def run_parser(parser, string), do: parser.(string)

def eof(parser) do
fn string ->
with {:ok, _, ""} = ok <- parser.(string) do
ok
else
{:ok, _a, rest} -> {:error, "Not end of file", rest}
err -> err
end
end
end

def satisfy(p) do
fn
<<char, rest::bitstring>> = string ->
if p.(char) do
{:ok, <<char>>, rest}
else
{:error, "unexpected #{char}", string}
end

"" ->
{:error, "unexpected end of string", ""}
end
end

def char(c), do: satisfy(&(&1 == c)) |> error("expected character #{<<c>>}")

def string(str) do
str
|> to_charlist
|> Enum.map(&char/1)
|> Enum.reduce(inject(""), &lift2(fn a, b -> a <> b end, &1, &2))
end

def some(parser) do
fn input ->
with {:ok, result, rest} <- parser.(input),
{:ok, results, rest} <- many(parser).(rest) do
{:ok, [result | results], rest}
end
end
end

def many(parser) do
fn input ->
with {:ok, result, rest} <- some(parser).(input) do
{:ok, result, rest}
else
{:error, _err, ^input} -> {:ok, [], input}
err -> err
end
end
end

def one_of(parsers) when is_list(parsers) do
fn string ->
Enum.reduce_while(parsers, {:error, "no parsers", string}, fn
_parser, {:ok, _, _} = result -> {:halt, result}
parser, _err -> {:cont, parser.(string)}
end)
end
end

def map(parser, f) do
fn string ->
with {:ok, a, rest} <- parser.(string) do
{:ok, f.(a), rest}
end
end
end

def error(parser, err) do
fn string ->
with {:error, _err, rest} <- parser.(string) do
{:error, err, rest}
end
end
end

def drop(p1, p2) do
fn string ->
with {:ok, a, rest} <- p1.(string),
{:ok, _, rest} <- p2.(rest) do
{:ok, a, rest}
end
end
end

def drop_and(p1, p2) do
fn string ->
with {:ok, _, rest} <- p1.(string) do
p2.(rest)
end
end
end

def inject(a) do
fn string -> {:ok, a, string} end
end

def lift2(pair, p1, p2) do
fn string ->
with {:ok, a, rest} <- p1.(string),
{:ok, b, rest} <- p2.(rest) do
{:ok, pair.(a, b), rest}
end
end
end

def validate(parser, p, err) do
fn string ->
with {:ok, result, rest} <- parser.(string) do
if p.(result) do
{:ok, result, rest}
else
{:error, err, rest}
end
end
Comment on lines +201 to +207
Copy link
Contributor

@neenjaw neenjaw Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is a single expression in the with, is there a benefit to using with over just a match?

Suggested change
with {:ok, result, rest} <- parser.(string) do
if p.(result) do
{:ok, result, rest}
else
{:error, err, rest}
end
end
{:ok, result, rest} = parser.(string)
if p.(result) do
{:ok, result, rest}
else
{:error, err, rest}
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The with here is important, because I need the error to be passed further if parser.(string) fails. If I use {:ok, result, rest} = parser.(string) and the parser fails, I will get an exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

end
end
end
48 changes: 48 additions & 0 deletions exercises/practice/sgf-parsing/.meta/tests.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This is an auto-generated file.
#
# Regenerating this file via `configlet sync` will:
# - Recreate every `description` key/value pair
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion)
# - Preserve any other key/value pair
#
# As user-added comments (using the # character) will be removed when this file
# is regenerated, comments can be added via a `comment` key.
[2668d5dc-109f-4f71-b9d5-8d06b1d6f1cd]
description = "empty input"

[84ded10a-94df-4a30-9457-b50ccbdca813]
description = "tree with no nodes"

[0a6311b2-c615-4fa7-800e-1b1cbb68833d]
description = "node without tree"

[8c419ed8-28c4-49f6-8f2d-433e706110ef]
description = "node without properties"

[8209645f-32da-48fe-8e8f-b9b562c26b49]
description = "single node tree"

[6c995856-b919-4c75-8fd6-c2c3c31b37dc]
description = "multiple properties"

[a771f518-ec96-48ca-83c7-f8d39975645f]
description = "properties without delimiter"

[6c02a24e-6323-4ed5-9962-187d19e36bc8]
description = "all lowercase property"

[8772d2b1-3c57-405a-93ac-0703b671adc1]
description = "upper and lowercase property"

[a759b652-240e-42ec-a6d2-3a08d834b9e2]
description = "two nodes"

[cc7c02bc-6097-42c4-ab88-a07cb1533d00]
description = "two child trees"

[724eeda6-00db-41b1-8aa9-4d5238ca0130]
description = "multiple property values"

[11c36323-93fc-495d-bb23-c88ee5844b8c]
description = "escaped property"
Loading