ulp-forth is a Forth interpreter and optimizing cross compiler for the
ESP32 ULP coprocessor. It has most of the Forth 2020 standard
implemented. Code is interpreted on the host machine, when
that is finished it compiles the main word for execution by the
ULP.
This is a 16 bit Forth implementation. It is case insensitive.
Several design choices were made to save ULP memory:
- The ULP cannot modify the dictionary or allocate memory.
- Only words used by mainare cross compiled (pruning).
- There are no error checks in the cross compiled output.
The Forth 2020 standard was followed for this implementation. There are some missing words, but all of the implemented words follow the standard and pass the test suite.
Copyright 2024-2025 Blake Felt blake.w.felt@gmail.com
- Installation
- Building ulp-forth
- Using ulp-forth
- Sharing memory
- Threading models
- Assembly words
- System words
- Clock words
- GPIO words
- Serial words
- I2C words
- Standard Core words
- Standard Core Extension words
- Standard Double words
- Optimizations
Releases can be found on the release page.
You can also build the latest tagged version from source with:
go install github.com/Molorius/ulp-forth@latestThe compiler can be built from source with
go build
Unit tests are run on the host and on a ulp emulator, they can be run with
go test ./...
Help running the program can be found with ulp-forth --help.
The interpreter can be run with
ulp-forth run
This can be used for testing logic, but only runs on the host so
cannot be used for testing hardware. Type bye to exit.
You can load files before the interpreter starts by including them in run the command.
ulp-forth run your_code.f
The cross compiler can be run with
ulp-forth build your_code.f
The user should pass in the list of files to be built, which will be interpreted in order before cross compiling the MAIN word. So
ulp-forth build first.f second.f
will first interpret first.f, then second.f, then cross compile the MAIN word.
- 
--assemblyOutput assembly that can be compiled by the main assemblers, set the --reserved flag before using.
- 
--custom_assemblyOutput assembly only for use by ulp-asm, another project by this author.
- 
--outputName of the output file.
- 
--reservedNumber of reserved bytes for the ULP, for use with --assembly flag (default 8176). Note that the Espressif linker has a bug so has 12 less total bytes. Any space not used by code or data is used for the stacks.
- 
--subroutineUse the subroutine threading model, see the threading models section. Faster but larger.
There are words that can be used to share memory with the esp32. When compiled with the --custom_assembly or --assembly flags, the output assembly will include the .global directive for the associated memory. This memory will not be optimized away.
| Shared word | Equivalent word | 
|---|---|
| GLOBAL-VARIABLE | VARIABLE | 
| GLOBAL-2VARIABLE | 2VARIABLE | 
| GLOBAL-ALLOCATE | ALLOCATE | 
Access should be done while holding the mutex, see the System words section.
Example:
global-variable example \ create a global variable named "example"
\ read an address while holding the mutex
: global@ ( address -- n )
    mutex.take \ take ownership of the mutex
    @ \ read the value at the address
    mutex.give \ release the mutex
;
\ write to an address while holding the mutex
: global! ( n address -- )
    mutex.take \ take ownership of the mutex
    ! \ write the value to the address
    mutex.give \ release the mutex
;
\ get the value at "example"
: get-example ( -- )
    example \ put the address of the memory onto the stack
    global@ \ read it
;
\ set the value at "example"
: set-example ( n -- )
    example \ put the address of the memory onto the stack
    global! \ write to it
;
There are two threading models for the output ULP code. This is the forth definition of "threading" and is not the same as multithreading in other languages. It can be thought of as the execution environment.
Token threading is usually smaller and subroutine threading is usually faster, but this can vary based on the program and optimizations.
This uses a lightweight virtual machine to execute all forth words. This allows for some very compact code, but there is a speed penalty for the virtual machine.
Code using this is roughly 20% smaller than subroutine threaded code.
This can be enabled with the --subroutine flag. It compiles all forth words into assembly subroutines. This is very fast while executing, but there is a size penalty.
Code using this is roughly 20% faster than token threaded code.
A few words are provided to make ULP assembly without extending the compiler.
ASSEMBLY ( objn objn-1 ... obj0 n "\<spaces\>name" -- )
Skip leading spaces. Parse name delimited by a space. Create a
definition for name that compiles to token threaded ULP assembly.
The assembly is the contents of the objects on the stack, with object count n.
Objects can be strings or integers.
Note that the assembly is built with ulp-asm, a project by the same author as ulp-forth. It is slightly different than the Espressif or micropython compilers.
Words built with ASSEMBLY should not access the return stack.
Example:
c" move r0, "
0x10 \ include this number
c" \njump next"
3 \ we want to compile the 3 items on the stack
ASSEMBLY MY-EXAMPLE
That will create a word MY-EXAMPLE which will output the
token threaded assembly:
move r0, 16
jump nextASSEMBLY-SRT ( objn objn-1 ... obj0 n "\<spaces\>name" -- )
Skip leading spaces. Parse name delimited by a space. Create a
definition for name that compiles to subroutine threaded ULP assembly.
The assembly is the contents of the objects on the stack, with object count n.
Objects can be strings or integers.
Note that the assembly is built with ulp-asm, a project by the same author as ulp-forth. It is slightly different than the Espressif or micropython compilers.
Words built with ASSEMBLY-SRT should not access the return stack. A return is automatically appended.
Example:
c" move r0, "
0x10 \ include this number
\ 
2 \ we want to compile the 2 items on the stack
ASSEMBLY-SRT MY-EXAMPLE
That will create a word MY-EXAMPLE which will output the
subroutine threaded assembly:
move r0, 16
add r2, r2, 1
jump r2ASSEMBLY-BOTH
  (
    objAn objAn-1 ... objA0 n
    objBn objBn-1 ... objB0 m
    "\<spaces\>name" --
  )
Skip leading spaces. Parse name delimited by a space. Create a
definition for name that compiles to token threaded and subroutine
threaded ULP assembly.
The token threaded assembly is the contents of the objA objects
on the stack, with object count n.
The subroutine threaded assembly is the contents of the objB objects
on the stack, with object count m.
Objects can be strings or integers.
Note that the assembly is built with ulp-asm, a project by the same author as ulp-forth. It is slightly different than the Espressif or micropython compilers.
Words built with ASSEMBLY-BOTH should not access the return stack. A return is automatically appended to the subroutine threaded assembly.
Example:
\ token threaded
c" move r0, "
0x10 \ include this number
2 \ we want to compile the 2 items on the stack
\ subroutine threaded
c" move r1, "
0x11 \ include this number
2 \ we want to compile the 2 items on the stack
ASSEMBLY-BOTH MY-EXAMPLE
That will create a word MY-EXAMPLE which will output the
token threaded assembly:
move r0, 16
jump nextand the subroutine threaded assembly:
move r1, 17
add r2, r2, 1
jump r2READ_RTC_REG ( addr low width "\<spaces\>name" -- )
Skip leading spaces. Parse name delimited by a space. Create an
assembly definition for name that reads from RTC address addr,
low bit low, width width; the result is placed on the stack.
WRITE_RTC_REG ( addr low width data "\<spaces\>name" -- )
Skip leading spaces. Parse name delimited by a space. Create an
assembly definition for name that writes to RTC address addr,
low bit low, width width.
2WRITE_RTC_REG
( addr0 high0 low0 data0
  addr1 high1 low1 data1 "\<spaces\>name" -- )
Skip leading spaces. Parse name delimited by a space. Create an
assembly definition for name that writes to RTC address addr0
followed by address addr1.
System words only run on the ULP.
HALT ( -- )
Halt execution of the ULP. Execution will resume at the instruction immediately following the HALT on both token and subroutine threaded models.
MUTEX.TAKE ( -- )
Takes the software mutex. The example project includes esp32 code to use this but a better way to use it needs to be written.
MUTEX.GIVE ( -- )
Gives the software mutex. The example project includes esp32 code to use this but a better way to use it needs to be written.
Clock words only run on the ULP.
The ULP has access to the RTC_SLOW clock. ulp-forth also has some busy-wait words for delays. Note that the ULP runs on the RTC_FAST clock, which we cannot read.
RTC_CLOCK ( -- d )
Read the lower 32 bits of the rtc_slow clock.
RTC_CLOCK_DELAY ( d -- )
Delay for d rtc_slow ticks.
BUSY_DELAY ( n -- )
Delay n times in a tight assembly loop.
DELAY_MS ( n -- )
Delay n milliseconds. The accuracy of this is affected by temperature and is device dependent.
The ULP can access certain pins called RTC_GPIO. These are mapped to the GPIO as well. Words are defined to help interface with them.
Each of these words are written with the prefix GPIOn, where n is the pin number. There are words for both the RTC_GPIOn and GPIOn numbers. Words are only written for GPIO that the ULP can access, and if a pin doesn't support output then the output words aren't defined.
Below is a table of all pins accessible to the ULP. RTC_GPIO is the naming used by the RTC subsystem, GPIO is the naming used by the rest of the ESP32 documentation.
| GPIO | RTC_GPIO | Notes | 
|---|---|---|
| 36 | 0 | Input only, no pullups or pulldowns. | 
| 37 | 1 | Input only, no pullups or pulldowns. | 
| 38 | 2 | Input only, no pullups or pulldowns. | 
| 39 | 3 | Input only, no pullups or pulldowns. | 
| 34 | 4 | Input only, no pullups or pulldowns. | 
| 35 | 5 | Input only, no pullups or pulldowns. | 
| 25 | 6 | |
| 26 | 7 | |
| 33 | 8 | |
| 32 | 9 | |
| 4 | 10 | |
| 0 | 11 | |
| 2 | 12 | |
| 15 | 13 | |
| 13 | 14 | |
| 12 | 15 | |
| 14 | 16 | |
| 27 | 17 | 
Not all pins are tested.
GPIOn.ENABLE ( -- )
Enable the usage of this pin by the ULP.
GPIOn.OUTPUT_ENABLE ( -- )
Enable output on this pin.
GPIOn.OUTPUT_DISABLE ( -- )
Disable output on this pin.
GPIOn.INPUT_ENABLE ( -- )
Allow reading from this pin.
GPIOn.SET_HIGH ( -- )
Set this pin to high.
GPIOn.SET_LOW ( -- )
Set this pin to low.
GPIOn.SET ( n -- )
If n is 0, set this pin to low. Otherwise set to high.
GPIOn.GET ( -- n )
Get the value of this pin. 0 or 1.
GPIOn.PULLUP_ENABLE ( -- )
Enable the pullup resistors on this pin.
GPIOn.PULLUP_DISABLE ( -- )
Disable the pullup resistors on this pin.
GPIOn.PULLDOWN_ENABLE ( -- )
Enable the pulldown resistors on this pin.
GPIOn.PULLDOWN_DISABLE ( -- )
Disable the pulldown resistors on this pin.
GPIO_NUMBER_TO_RTC ( gpio_num -- rtc_gpio_num )
Convert the GPIO number to the corresponding RTC_GPIO number.
The ULP does not have a hardware serial. This is implemented in assembly.
SERIAL.WRITE_CREATE ( pin wait-time "\<spaces\>name" -- )
Skip leading spaces. Parse name delimited by a space. Create an
assembly definition for name that uses the pin RTC_GPIO and
delay wait-time to write to serial. See the "hello_world.f" example for setup.
For example, if you used the word SERIAL_TX then it would create the definition:
SERIAL_TX ( c -- )
which outputs the character c.
SERIAL.WRITE_9600_BAUD ( -- wait-time )
Returns the wait-time to achieve 9600 baud.
SERIAL.WRITE_115200_BAUD ( -- wait-time )
Returns the wait-time to achieve 115200 baud. This is the fastest standard baud rate available with the assembly algorithm used.
The ULP has hardware I2C but it is difficult to use and limited in its features. This is a forth software implementation. It has clock stretching and allows for an arbitrary number of devices, but is slower than hardware.
To use this, you need to implement the deferred words:
- I2C.SDA_HIGH
- I2C.SDA_LOW
- I2C.SDA_GET
- I2C.SCL_HIGH
- I2C.SCL_LOW
- I2C.SCL_GET
See the "util/i2c_scan.f" file for pin setup.
I2C.START ( -- )
Send a start condition on the bus.
I2C.START_READ ( address -- ack )
Send the start condition and send a read command to the address.
Returns TRUE if acknowledged.
I2C.START_WRITE ( address -- ack )
Send the start condition and send a write command to the address.
Returns TRUE if acknowledged.
I2C.WRITE ( n -- ack )
Send byte n on the bus. Returns TRUE if acknowledged.
I2C.READ ( -- n )
Read a byte n from the bus. This does not respond with a ack/nack,
that should be done with I2C.ACK or I2C.NACK.
I2C.ACK ( -- )
Send an ack bit on the bus.
I2C.NACK ( -- )
Send a nack bit on the bus.
I2C.STOP ( -- )
Send a stop condition on the bus.
These are the the core words that are implemented. Missing words
may be implemented in the future, only a few such as HERE cannot
be implemented because of the ulp-forth architecture.
Some of the core words are created with DEFER so they can be
easily overwritten. These are noted below.
Words that can only run on the host are noted as well. Some words,
such as DO, cannot be directly executed on the ULP but words
created with them can run on the ULP. These are not noted as it
depends on how you attempt to use them, ulp-forth will throw an
error if it cannot be cannot be cross compiled.
- '
- (
- *
- +
- +!
- +LOOP
- ,- Can only run on host.
 
- -
- .
- ."
- /
- /MOD- Deferred to S/REM, can defer toF/MOD.
 
- Deferred to 
- 0<
- 0=
- 1+
- 1-
- 2!
- 2*
- 2!
- 2/
- 2@
- 2DROP
- 2DUP
- 2OVER
- 2SWAP
- :- Can only run on host.
 
- ;- Can only run on host.
 
- <
- =
- >
- >BODY
- >R
- ?DUP
- @
- ABS
- ALIGN- Can only run on host.
 
- ALIGNED
- ALLOT- Can only run on host.
 
- AND
- BASE
- BEGIN
- BL
- C!
- C,- Can only run on host.
 
- C@
- CELL+
- CELLS
- CHAR
- CHAR+
- CHARS
- CONSTANT- Can only run on host.
 
- COUNT
- CR
- CREATE- Can only run on host.
 
- DECIMAL
- DEPTH
- DO
- DOES>- Can only run on host.
 
- DROP
- DUP
- ELSE
- EMIT- Deferred so a program can output to any interface.
EMITis used for all printing, such as..
 
- Deferred so a program can output to any interface.
- EVALUATE- Can only run on host.
 
- EXECUTE
- EXIT
- FILL
- FIND- Can only run on host.
 
- HERE
- I
- IF
- IMMEDIATE
- INVERT
- J
- LEAVE
- LITERAL
- LOOP
- LSHIFT
- MAX
- MIN
- MOD
- MOVE
- NEGATE
- OR
- OVER
- POSTPONE
- R>
- R@
- RECURSE
- REPEAT
- ROT
- RSHIFT
- S"
- S>D
- SPACE
- SPACES
- STATE
- SWAP
- THEN
- TYPE
- U.
- U<
- UNLOOP
- UNTIL
- VARIABLE
- WHILE
- WORD- Can only run on host.
 
- XOR
- [
- [']
- [CHAR]
- ]
Missing words may be implemented in the future.
- .(
- 0<>
- 0>
- 2>R
- 2R>
- 2R@
- :NONAME- Can only run on host.
 
- <>
- ?DO
- AGAIN
- BUFFER:
- C"- (nonstandard) can be interpreted.
 
- CASE
- COMPILE,
- DEFER
- DEFER!
- DEFER@
- ENDCASE
- ENDOF
- ERASE
- FALSE
- HEX
- IS
- NIP
- OF
- PICK
- ROLL
- TO
- TRUE
- TUCK
- U>
- VALUE
- WITHIN
- [COMPILE]
- \
Not all of the double words are currently implemented, but they all can be in the future.
- 2CONSTANT
- 2LITERAL
- 2VARIABLE
- D+
- D-
- D0<
- D<
- D0=
- D>S
- DABS
- DMAX
- DMIN
- DNEGATE
- M+
- 2ROT
- 2VALUE
- DU<
The cross compiler includes some optimizations. More may be added later.
- Inline deferred words. If a word is defined by DEFERbut the deferred word cannot be changed by the cross compiled output, this will inline the word. Normally a deferred word will look like:address-containing-word @ EXECUTE EXIT, this optimizes that to:word EXIT.
- Tail calls. Words may be defined in assembly or forth. If the final word before an EXIT(or end of definition) is a forth word, this will instead jump to it. For example, a word that is compiled as+ forth-word EXITwill be optimized to+ jump(forth-word). Smaller in token threaded model, faster, saves a stack slot.
To be added later:
- Fallthrough forth words instead of tail call
- Assembly inlining (subroutine threaded only)
- Forth inlining
- Forth common sequence compression
- Flow control analysis
- Constant folding
- Peephole optimization