-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed formal syntax for edn #56
Comments
See also https://gist.github.com/bpsm/5951638 which is just the plain WSN with comments. |
@hubrys Thanks for taking the time to review the syntax. I'll try to respond to your objections one at at time:
Let's take the production for vectors as an example:
If we substitute in the definition of elements we get:
The production for s, allows any amount of white space (including no white space, since
Here are a few examples that would match this production: But, there is a bug, the following don't match: I believe this fixes it by allowing
With this change
I don't follow,
Am I missing something? |
@bpsm Ahhhh, makes sense now. For some reason I was under the impression that the { } brackets implied that the item inside of them would have to be there at least once, not 0 or more times. That actually clears up and answers all the questions I had. Thanks. |
A few things I've discovered
|
How about an ANTLR https://github.com/antlr/antlr4 grammar for EDN of this to be even more precise? There seems to be already one for Clojure https://github.com/antlr/grammars-v4/tree/master/clojure Thank you. |
Hello, I just made a grammar for the EDN format. If you are still interested, please check it at antlr/grammars-v4#1831 |
@Marti2203 Thank you for the grammar file ! |
:) With pleasure, please check it and give a review as I had never heard of EDN before seeing the issue in the grammar repo |
My purpose in writing this is to get a discussion started which will hopefully lead to hashing out a formal syntax for edn.
(This grammar is written in slightly extended version of Wirth Syntax Notation. A description is appended to the end of this document. I'm not married to using WSN. I chose it because it's familiar to me, is easily explained and is compact, but translating these productions to some other notation would be no great hardship once errors and ambiguities have been inroned out.)
Points to Consider
This syntax considers code point NUL (
U+0
) unacceptable in valid edn input.This syntax forbids supurfluous leading zeros in numeric literals. (Unlike current implementations.)
This syntax does not allow unicode characters outside of the US-ASCII range to appear in symbol or keyword names.
This syntax does not allow
<
and>
to appear in symbol or keyword names.This syntax supports
\backspace
and\formfeed
,"\b"
and"\f"
, as do current implementations.Elements
White space is allowed between elements, but not always required. For example:
{}{}
parses as two empty maps."MeaningOfLife"42
parses as a String followed by an integer. However,a:b
parses as a single symbol, not as the symbola
followed by the keyword:b
.White Space
For parsing we treat comments and discarded elements to be as we do whitespace. This production describes some (possibly zero-length) run of text that is ignored by the parser:
Edn, like Clojure, considers the comma (
,
) to be white space.(The syntax is written to be quite narrow here. Perhaps we should just accept "," and anything in the range
U+1
… " " as white space.)Issue#31 states that only
\newline
(i.e.U+A
) terminates comments.Comments can contain any code points, except LF (which terminates the comment). This is compatible with both unix (
U+A
) and windows (U+D U+A
) style line breaks. It's incompatible with the line break style of classic Mac OS (U+D
). Presumably that's no great loss.Symbols
Issue#30 states that "tag symbols must begin with an alphabetic character", so we introduce a production for this:
Issue#32 says that
:/
is not a valid keyword. Otherwise keywords adhere the naming rules of symbols following the initial:
.Names beginning with a
nameStart2
must continue with a letter if they continue at all. This avoids possible ambiguity with numeric literals.README.md, does not allow
<
and>
to appear in symbol (or keyword) names. clojure.edn/read and clojure.core/read both accept<
and>
as members of nameStart1. Is this deliberate?The words
true
,false
andnil
look syntactically like symbols, but are not parsed as such. I've given them their own productions to hint at this:(But, that makes this grammar ambiguous since there are now two ways of parsing "true", "false" and "nil".)
Numbers
This syntax does not allow supurfluous leading zeros in integers.
README.md (the current informal standard) does allow leading zeros.
clojure.core/read and clojure.edn/read both allow leading zeros here but interpret the remaining digits in base 8! See also Issue#33.
edn-java allows leading zeros, but gives them no special meaning.
The
float
syntax disagrees with the formal syntax from README.md, but does so in order to comply with "In addition, a floating-point number may have the suffix M to indicate that exact precision is desired."The grammar in README.md does not allow leading zeros in the integer and exponent portions of a float. clojure.core/read, clojure.edn/read and edn-java all accept leading zeros in these cases, but given them no special interpretation.
The fractional portion can consist of only a "." (not followed by any digits). This is consistent with the current spec and with the behavior of clojure.core/read and clojure.edn/read
Characters
Certain characters can be refered to by names. In their literal form these characters would display as whitespace, making the resulting edn quite confusing for a person to read or edit.
The specification only mentions the first four of these explicitly; backspace and formfeed are included for symmentry with string and because clojure.edn/read and clojure.core/read both support them.
Characters literals for other code points are written by simply including the literal character immediate following the "".
This definition of printableCharacter is sloppy. There are probably other Unicode code points that we don't want to use in character literals because they have no printed representation or perform some control function.
Strings
README.md only mentions "Standard C/Java escape characters
\t
\r
\n
are supported", but clearly\\
and\"
must be included.\b
\f
are included because they are supported by clojure.core/read and clojure.edn/read.\'
is excluded despite being supported by Java because clojure.core/read and clojure.edn/read reject\'
.Wirth Syntax Notation
Wikipedia provide as a brief description of WSN. The formal Syntax of WSN in WSN follows:
The only oddity for modern readers is that string literals do not provide anything like the
\
escaping convention we have grown used to C-like languages. One embeds a double quote in a string by doubling it. All other characters in the string literal are taken literally, as written.An Extension to WSN to Represent Unicode Codepoints
Edn's syntax is specified in terms of unicode code points, the first 128 of which are identical to US-ASCII. (Edn is always serialized as UTF-8, which can represent the full set of unicode code points.)
For the purposes of this grammar, we'll use the following extension to represent a unicode code point:
For example:
NUL
is written asU+0
, " " can be written asU+20
and "~" can be written asU+7F
.To represent large contiguous subsets of the unicode codepoints, we use an elipsis as follows:
The text was updated successfully, but these errors were encountered: