-
-
Notifications
You must be signed in to change notification settings - Fork 402
New practice exercise sgf-parsing
#795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
69a459a
New practice exercise
jiegillet f9694b9
Remove errors
jiegillet 38d1bda
Update instructions
jiegillet 8cff89d
Refactor examples.ex
jiegillet c47fabb
Remove .swo
jiegillet 65749c3
Contributors
jiegillet d70d76b
Merge branch 'main' into jie-sgf-parsing
jiegillet 6884881
Remove practices string (check passed before food-chain was merged)
jiegillet File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Description | ||
|
||
Parsing a Smart Game Format string. | ||
|
||
[SGF](https://en.wikipedia.org/wiki/Smart_Game_Format) is a standard format for | ||
storing board game files, in particular go. | ||
|
||
SGF is a fairly simple format. An SGF file usually contains a single | ||
tree of nodes where each node is a property list. The property list | ||
contains key value pairs, each key can only occur once but may have | ||
multiple values. | ||
|
||
An SGF file may look like this: | ||
|
||
```text | ||
(;FF[4]C[root]SZ[19];B[aa];W[ab]) | ||
``` | ||
|
||
This is a tree with three nodes: | ||
|
||
- The top level node has three properties: FF\[4\] (key = "FF", value | ||
= "4"), C\[root\](key = "C", value = "root") and SZ\[19\] (key = | ||
"SZ", value = "19"). (FF indicates the version of SGF, C is a | ||
comment and SZ is the size of the board.) | ||
- The top level node has a single child which has a single property: | ||
B\[aa\]. (Black plays on the point encoded as "aa", which is the | ||
1-1 point). | ||
- The B\[aa\] node has a single child which has a single property: | ||
W\[ab\]. | ||
|
||
As you can imagine an SGF file contains a lot of nodes with a single | ||
child, which is why there's a shorthand for it. | ||
|
||
SGF can encode variations of play. Go players do a lot of backtracking | ||
in their reviews (let's try this, doesn't work, let's try that) and SGF | ||
supports variations of play sequences. For example: | ||
|
||
```text | ||
(;FF[4](;B[aa];W[ab])(;B[dd];W[ee])) | ||
``` | ||
|
||
Here the root node has two variations. The first (which by convention | ||
indicates what's actually played) is where black plays on 1-1. Black was | ||
sent this file by his teacher who pointed out a more sensible play in | ||
the second child of the root node: `B[dd]` (4-4 point, a very standard | ||
opening to take the corner). | ||
|
||
A key can have multiple values associated with it. For example: | ||
|
||
```text | ||
(;FF[4];AB[aa][ab][ba]) | ||
``` | ||
|
||
Here `AB` (add black) is used to add three black stones to the board. | ||
|
||
There are a few more complexities to SGF (and parsing in general), which | ||
you can mostly ignore. You should assume that the input is encoded in | ||
UTF-8, the tests won't contain a charset property, so don't worry about | ||
that. Furthermore you may assume that all newlines are unix style (`\n`, | ||
no `\r` or `\r\n` will be in the tests) and that no optional whitespace | ||
between properties, nodes, etc will be in the tests. | ||
|
||
The exercise will have you parse an SGF string and return a tree | ||
structure of properties. You do not need to encode knowledge about the | ||
data types of properties, just use the rules for the | ||
[text](http://www.red-bean.com/sgf/sgf4.html#text) type everywhere. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Used by "mix format" | ||
[ | ||
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"] | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"authors": ["jiegillet"], | ||
"contributors": [ | ||
"neenjaw", | ||
"angelikatyborska" | ||
], | ||
"files": { | ||
"example": [ | ||
".meta/example.ex" | ||
], | ||
"solution": [ | ||
"lib/sgf_parsing.ex" | ||
], | ||
"test": [ | ||
"test/sgf_parsing_test.exs" | ||
] | ||
}, | ||
"blurb": "Parsing a Smart Game Format string.", | ||
"title": "SGF Parsing" | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
defmodule SgfParsing do | ||
# Used to make recursive parsers lazy | ||
defmacro lazy(parser) do | ||
quote do | ||
fn string -> unquote(parser).(string) end | ||
end | ||
end | ||
|
||
defmodule Sgf do | ||
defstruct properties: %{}, children: [] | ||
end | ||
|
||
@type sgf :: %Sgf{properties: map, children: [sgf]} | ||
@doc """ | ||
Parse a string into a Smart Game Format tree | ||
""" | ||
@spec parse(encoded :: String.t()) :: {:ok, sgf} | {:error, String.t()} | ||
def parse(encoded) do | ||
parser = parse_tree_paren() |> eof() | ||
|
||
with {:ok, tree, ""} <- run_parser(parser, encoded) do | ||
{:ok, tree} | ||
else | ||
{:error, err, _rest} -> {:error, err} | ||
end | ||
end | ||
|
||
# TREE PARSER | ||
|
||
def parse_tree() do | ||
parse_properties = | ||
char(?;) | ||
|> error("tree with no nodes") | ||
|> drop_and(many(parse_property())) | ||
|> map(&Map.new/1) | ||
|
||
parse_children = | ||
one_of([ | ||
map(parse_tree(), &List.wrap/1), | ||
many(parse_tree_paren()) | ||
]) | ||
|> lazy() | ||
|
||
lift2(&%Sgf{properties: &1, children: &2}, parse_properties, parse_children) | ||
end | ||
|
||
def parse_tree_paren() do | ||
char(?() | ||
|> error("tree missing") | ||
|> drop_and(parse_tree()) | ||
|> drop(char(?))) | ||
end | ||
|
||
def parse_property() do | ||
parse_name = | ||
some(satisfy(&(&1 not in '[();'))) | ||
|> map(&Enum.join(&1, "")) | ||
|> validate(&(&1 == String.upcase(&1)), "property must be in uppercase") | ||
|
||
parse_attributes = | ||
some( | ||
char(?[) | ||
|> error("properties without delimiter") | ||
|> drop_and(many(escaped(&(&1 != ?])))) | ||
|> drop(char(?])) | ||
|> map(&Enum.join(&1, "")) | ||
) | ||
|
||
lift2(&{&1, &2}, parse_name, parse_attributes) | ||
end | ||
|
||
def escaped(p) do | ||
one_of([ | ||
lift2(&escape/2, char(?\\), satisfy(&(&1 in 'nt]['))), | ||
satisfy(p) | ||
]) | ||
end | ||
|
||
def escape("\\", "n"), do: "\n" | ||
def escape("\\", "t"), do: "\t" | ||
def escape("\\", "]"), do: "]" | ||
def escape("\\", "["), do: "[" | ||
|
||
# PARSER COMBINATORS LIBRARY | ||
# Inspired from Haskell libraries like Parsec | ||
# and https://serokell.io/blog/parser-combinators-in-elixir | ||
|
||
def run_parser(parser, string), do: parser.(string) | ||
|
||
def eof(parser) do | ||
fn string -> | ||
with {:ok, _, ""} = ok <- parser.(string) do | ||
ok | ||
else | ||
{:ok, _a, rest} -> {:error, "Not end of file", rest} | ||
err -> err | ||
end | ||
end | ||
end | ||
|
||
def satisfy(p) do | ||
fn | ||
<<char, rest::bitstring>> = string -> | ||
if p.(char) do | ||
{:ok, <<char>>, rest} | ||
else | ||
{:error, "unexpected #{char}", string} | ||
end | ||
|
||
"" -> | ||
{:error, "unexpected end of string", ""} | ||
end | ||
end | ||
|
||
def char(c), do: satisfy(&(&1 == c)) |> error("expected character #{<<c>>}") | ||
|
||
def string(str) do | ||
str | ||
|> to_charlist | ||
|> Enum.map(&char/1) | ||
|> Enum.reduce(inject(""), &lift2(fn a, b -> a <> b end, &1, &2)) | ||
end | ||
|
||
def some(parser) do | ||
fn input -> | ||
with {:ok, result, rest} <- parser.(input), | ||
{:ok, results, rest} <- many(parser).(rest) do | ||
{:ok, [result | results], rest} | ||
end | ||
end | ||
end | ||
|
||
def many(parser) do | ||
fn input -> | ||
with {:ok, result, rest} <- some(parser).(input) do | ||
{:ok, result, rest} | ||
else | ||
{:error, _err, ^input} -> {:ok, [], input} | ||
err -> err | ||
end | ||
end | ||
end | ||
|
||
def one_of(parsers) when is_list(parsers) do | ||
fn string -> | ||
Enum.reduce_while(parsers, {:error, "no parsers", string}, fn | ||
_parser, {:ok, _, _} = result -> {:halt, result} | ||
parser, _err -> {:cont, parser.(string)} | ||
end) | ||
end | ||
end | ||
|
||
def map(parser, f) do | ||
fn string -> | ||
with {:ok, a, rest} <- parser.(string) do | ||
{:ok, f.(a), rest} | ||
end | ||
end | ||
end | ||
|
||
def error(parser, err) do | ||
fn string -> | ||
with {:error, _err, rest} <- parser.(string) do | ||
{:error, err, rest} | ||
end | ||
end | ||
end | ||
|
||
def drop(p1, p2) do | ||
fn string -> | ||
with {:ok, a, rest} <- p1.(string), | ||
{:ok, _, rest} <- p2.(rest) do | ||
{:ok, a, rest} | ||
end | ||
end | ||
end | ||
|
||
def drop_and(p1, p2) do | ||
fn string -> | ||
with {:ok, _, rest} <- p1.(string) do | ||
p2.(rest) | ||
end | ||
end | ||
end | ||
|
||
def inject(a) do | ||
fn string -> {:ok, a, string} end | ||
end | ||
|
||
def lift2(pair, p1, p2) do | ||
fn string -> | ||
with {:ok, a, rest} <- p1.(string), | ||
{:ok, b, rest} <- p2.(rest) do | ||
{:ok, pair.(a, b), rest} | ||
end | ||
end | ||
end | ||
|
||
def validate(parser, p, err) do | ||
fn string -> | ||
with {:ok, result, rest} <- parser.(string) do | ||
if p.(result) do | ||
{:ok, result, rest} | ||
else | ||
{:error, err, rest} | ||
end | ||
end | ||
end | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# This is an auto-generated file. | ||
# | ||
# Regenerating this file via `configlet sync` will: | ||
# - Recreate every `description` key/value pair | ||
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications | ||
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion) | ||
# - Preserve any other key/value pair | ||
# | ||
# As user-added comments (using the # character) will be removed when this file | ||
# is regenerated, comments can be added via a `comment` key. | ||
[2668d5dc-109f-4f71-b9d5-8d06b1d6f1cd] | ||
description = "empty input" | ||
|
||
[84ded10a-94df-4a30-9457-b50ccbdca813] | ||
description = "tree with no nodes" | ||
|
||
[0a6311b2-c615-4fa7-800e-1b1cbb68833d] | ||
description = "node without tree" | ||
|
||
[8c419ed8-28c4-49f6-8f2d-433e706110ef] | ||
description = "node without properties" | ||
|
||
[8209645f-32da-48fe-8e8f-b9b562c26b49] | ||
description = "single node tree" | ||
|
||
[6c995856-b919-4c75-8fd6-c2c3c31b37dc] | ||
description = "multiple properties" | ||
|
||
[a771f518-ec96-48ca-83c7-f8d39975645f] | ||
description = "properties without delimiter" | ||
|
||
[6c02a24e-6323-4ed5-9962-187d19e36bc8] | ||
description = "all lowercase property" | ||
|
||
[8772d2b1-3c57-405a-93ac-0703b671adc1] | ||
description = "upper and lowercase property" | ||
|
||
[a759b652-240e-42ec-a6d2-3a08d834b9e2] | ||
description = "two nodes" | ||
|
||
[cc7c02bc-6097-42c4-ab88-a07cb1533d00] | ||
description = "two child trees" | ||
|
||
[724eeda6-00db-41b1-8aa9-4d5238ca0130] | ||
description = "multiple property values" | ||
|
||
[11c36323-93fc-495d-bb23-c88ee5844b8c] | ||
description = "escaped property" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is a single expression in the
with
, is there a benefit to usingwith
over just a match?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
with
here is important, because I need the error to be passed further ifparser.(string)
fails. If I use{:ok, result, rest} = parser.(string)
and the parser fails, I will get an exception.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍