Skip to content

Literal syntax for strings and characters

Attila Magyar edited this page May 27, 2017 · 30 revisions

Most Forth implementations don't have string or character literals built into the language. The reason for this lies in how Forth words are parsed. Simplicity is one of the key virtues of Forth. The Forth syntax is extremely simple, building a parser from scratch (even in assembly) is trivial. Forth source code consists of white space separated tokens. These tokens represent either words or numbers. The outer interpreter grabs the next token and looks it up in the dictionary. If it finds it, it will interpret it as a word (and that word will be either compiled or executed), otherwise it will try to convert it to a number (and that number will be either pushed onto the data stack or compiled as a literal).

Here is how a typical Forth outer interpeter looks like.

\ fig-Forth's outer interpeter
: interpret ( -- )
  begin -find 
    if state @ <
      if cfa , else cfa execute then
    else here number dpl @ 1+
      if drop [compile] literal
      else    [compile] dliteral
      then
     then ?stack
  again ;

Most Forth systems choose to use parsing words to add support for strings or characters, instead of extending the outer interpreter with new cases and making it complicated (who would be so cruel to make this nice code ugly and complex?)

Nothing prevents you to define a new word and name it ".

: " <parse the string from the input until you find a ">

Here is how you can use it to define a Hello World! string. Note that the leading space is not part of the string. Remember that Forth parses white space separated tokens and the " is one of them.

" Hello world!"

In punyforth I chose to use a different parsing word for this because I didn't like the leading space at the beginning of each string.

: str: <find the first non white space character, treat it as separator, parse the string until the end separator>

str: "Hello World!"

There is no leading space, but unfortunately you have to use this relatively long parsing word before each strings.

I was not satisfied with either of these solutions so I decided to add real string literals. I extended the outer interpreter with 2 hooks and now it works like this.

  1. Find the next token
  2. If it's in the dictionary compile or execute
  3. Otherwise try to convert it to a number
  4. If it's not a number call hook1 when we're interpreting or hook2 if we're compiling

Now the code that recognizes a string can be hooked into the outer interperter. In fact, the number conversion can be extracted out and implemented as a hook too.

This is a proof of concept implementation of these hooks. Readibiliy is not the number one concern in the core, if you want to see nice Forth code look a nice user library not an internal Forth code.

: str, ( len -- ) >in @ swap - >in ! char: " c,-until ;

: _ ( addr len -- ? )
    \ recognize char
    2dup 2 = swap c@ char: $ = and if drop ['], 1+ c@ , exit then
    \ recognize str
    over c@ char: " = if nip [str >r str, r> str] exit then
    eundef ;

' _ eundefc ! \ hook it into the compiler
: _ ( addr len -- ? )
    \ recognize char
    2dup 2 = swap c@ char: $ = and if drop 1+ c@ exit then
    \ recognize str
    over c@ char: " = if nip dp >r str, 0 c, r> exit then
    eundef ;

' _ eundefi ! \ hook it into the text interpreter

This allows us to define strings naturally like this:

"Hello World!" constant: message
message type

Or characters like this:

$A emit \ prints out a A character

Using the 2 hooks prevents the outer interpreter to be more complicated. In fact, it can be simpler than before if we extract out the number convarsion and implement it as a hook too.