Skip to content

Latest commit

 

History

History
122 lines (94 loc) · 4.24 KB

grammar_respec.md

File metadata and controls

122 lines (94 loc) · 4.24 KB

Grammar re-specification:

Programs are parsed line-by-line. Lines are tokenized by splitting them at space characters ( (0x20) or \t). Consecutive space characters do not have empty tokens between them; empty space is elided. All text on a line after the first instance of a comment sigil is ignored, including the comment sigil. The comment sigils are # and //.

The following token types exist, defined as regexes, and are tested from first (symbol) to last (text), with the first taking priority, stopping at the first space character ( (0x20) or \t) or the end of the line.

symbol:  =|<-|{|}
numeric: [\.\-0-9].*
text:    .*

The tokenizer is not aware of what symbols look like. Spaces between all tokens are mandatory.

For the sake of simpler grammar specification following meta-token type also exists:

value: <numeric>|<text>

Types occasionally occur within a given line. Types match any of the following:

i8
i16
i32
i64
f32
f64
<symbol> <text>+ <symbol>

Above and below, names within <> refer to a previously-defined type of token.

The last type syntax rule is an aggregate type. Aggregates have their own internal rules, but each internal token matches as a "text" token.

The tokens making up an aggregate type must follow the format:

{ (packed)? align\.[0-9]+ ((i|f)\.[0-9]+)+ }

For example:

{ packed align.8 f.4 i.4 }

Any symbol pair other than { and } around the aggregate type is a syntax error. packed present anywhere other than the first text token is a syntax error. align.N present anywhere other than the first non-packed text token is a syntax error. Subsequent text tokens of any format other than (i|f).N are a syntax error. Values of 0 for the N after align., i., or f. are a semantic error.

For statements, the following operations are supported:

load <type> <value>
ternary <value> <value> <value>
(add|sub|mul|imul|div|idiv|rem|irem|div_unsafe|idiv_unsafe|rem_unsafe|irem_unsafe) <value> <value>
(shl|shr|shr_unsafe|sar|sar_unsafe|and|or|xor) <value> <value>
cmp_(eq|ne|ge|le|g|l) <value> <value>
icmp_(ge|le|g|l) <value> <value>
fcmp_(eq|ne|ge|le|g|l) <value> <value>
(bnot|not|bool) <value>
(addf|subf|mulf|divf|remf) <value> <value>
(trim|zext|sext) <type> <value>
(f32_to_f64|f64_to_f32) <value>
(float_to_uint|float_to_uint_unsafe|uint_to_float) <type> <value>
(float_to_sint|float_to_sint_unsafe|sint_to_float) <type> <value>
bitcast <type> <value>
extract <type> <value> <value>
inject <value> <value> <value>
build <type> <value>+ // type must be an aggregate
call_eval <type> <value> <value>*
freeze <value>
(ptralias|ptralias_merge|ptralias_disjoint) <value> <value>
ptralias_bless <value>
symbol_lookup_unsized <text> // arbitrary text, does not need to be a valid label, but cannot contain spaces or newlines or // or #
symbol_lookup <text> <numeric>

Operations are used in the form:

<text> = <operation>

Instructions:

goto <text> <value>*
if <value> goto <text> <value>*
return <value>
call <type> <value> <value>*
(memcpy|memmove) <value> <value> <value>
store <value> <value>
exit <value>
interrupt

The exceptional instructions bytes, bytes_clobber, asm, and asm_clobber were defined earlier in this document. A grammar-like description of them follows.

bytes <untyped integer literal>+
bytes_clobber (<text> <untyped integer literal>)* <- <untyped integer literal>* <- (<text> <untyped integer literal>)*
asm <arbitrary text>
bytes_clobber (<text> <untyped integer literal>)* <- <arbitrary text> <- (<text> <untyped integer literal>)*

Outside of statements, the following directives are possible:

func <text> (returns <type>)?
arg <text> <type>
stack_slot <text> <type>
block <text>
endfunc
global <type> <text>
static <type> <text> = <untyped integer literal>+

Directives are only allowed in specific places. func, global, and static directives can only happen in the 'root' of the file. arg can only happen in a list of args after a func or block.

Statements and directives may be followed by decorators, which begin with !. Once a decorator begins, only decorators and end-of-line (including comments) are allowed. Decorators must be separated by any nonzero number of spaces, and also separated from any tokens.