Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Format] Syntax for referencing data offsets and table elements in code #1368

Open
gwenya opened this issue Aug 28, 2020 · 10 comments
Open

Comments

@gwenya
Copy link

gwenya commented Aug 28, 2020

To facilitate writing readable text format WebAssembly, it would be useful to be able to give names to data offsets and table elements and reference those names in code, like other assembly languages allow. Currently, the only way to refer to data is to manually write the offset like this:

(module
  (data (offset (i32.const 123)) "foo" "bar")
  (func $getBarAddr (result i32)
    (i32.const 126)))

If the data changes, the offset needs to be updated manually:

(module
  (data (offset (i32.const 123)) "fooooo" "bar")
  (func $getBarAddr (result i32)
    (i32.const 129)))

To make this easier to both write and read, and to avoid having to update the offsets in the code if the data changes, a syntax like the following could be used (see WebAssembly/wabt#1199 (comment)):

(module
  (data (offset (i32.const 123)) "foo" $barStr "bar")
  (func $getBarAddr (result i32)
    (i32.const data-addr=$barStr)))

This only works in the proposed way with const offsets. One option is to only allow such names in data segments that have a const offset, but I believe it is better to also allow it for global offsets, but with slightly different semantics:

(module
  (data (offset (global.get $dataStart)) "foo" $barStr "bar")
  (func $getBarAddr (result i32)
    (i32.add
      (global.get $dataStart)
      (i32.const data-addr=$barStr))))

In this case, the $barStr name would refer to the offset relative to the data segment, i.e. 3 in this case, since absolute addressing is not possible.
In order to make this more obvious, we could have different syntaxes for referencing global-offset and const-offset data names, e.g. data-addr=$name for const offsets and data-offset=$name for global offsets.

Additionally, we could allow data offset names either in the offset= of load instructions or with a special offset token like data-offset= like this:

(loop
  (i32.load8_u offset=$fooStr ;; alternatively: i32.load8_u data-offset=$fooStr
    (local.get $i))
  ;; do something with the byte
  ;; loop counter logic
)

In conjunction with #1348, we could also allow data offsets to be used within the data section itself, like this:

(data (offset (i32.const 100))
 "foobar"
  $addr1
  ...
  (i32 data-addr=$addr1)
  ...
)

In this example, (i32 data-addr=$addr1) would translate to the i32 representation of the $addr1 offset, which is 106 (const offset 100 plus the 6 bytes of "foobar").

A similar syntax might be useful to have for table elements, but adding names to them is not quite as simple as with data since element sections are already just a list of names. One way would be to add parentheses like this:

(elem (i32.const 0) ($name $func) ($otherName $otherFunc) $yetAnotherFunc)

This would create an element section that contains the functions $func, $otherFunc and $yetAnotherFunc, and would let us reference the first two table elements by the names $name and $otherName, with a syntax like this:

i32.const table-elem=$name
call_indirect (type ...)

This syntax for naming table elements is rather confusing and not very readable, but I haven't yet come up with something better.

Both of these additions would only affect the text format, when transforming wat formats containing such syntax to wasm, the references like data-addr=$name and table-elem=$name would be replaced by the offsets they refer to.

EDIT: updated with suggestions from #1368 (comment)

@carlsmith
Copy link

If writing WAT by hand is a concern now, it seems like the syntax should allow (even if only potentially in the future) data elements to use typed integers (as well as strings) for populating memory.

Much more generally, the Web badly needs an assembly language designed for writing code with, and WAT will never fill that role satisfactorily. It's not especially difficult to create an assembly language that assembles to Wasm (or transpiles to WAT), and you need to assemble your source files anyway. It seems best to let WAT just do its original job, and we can all develop new languages outside of the standards process, and see what catches on over time.

@gwenya
Copy link
Author

gwenya commented Sep 4, 2020

#1348 proposes a syntax for typed data. Why do you think WAT will never be satisfactory for writing code?

@binji
Copy link
Member

binji commented Sep 4, 2020

@Sammax Ah, good point that this integrates nicely with #1348. You may want to incorporate into the design above. So you can reference an address directly in the data section via:

(data (offset (i32.const 0))
  $addr1
  ...
  (i32 data-addr=$addr1)
  ...
)

Edit:

The data-addr=$name and table-elem=$name syntax would only be allowed in i32.const and i64.const instructions.

I also just realized that we may want to allow this for memory instruction offsets too. So, for example, we can replace:

(data (offset (i32.const 100)) "stuff")
...
i32.load offset=100

with

(data (offset (i32.const 100)) $addr "stuff")
...
i32.load data-offset=$addr

Typically the offset is used for a fixed offset from a pointer (i.e. a struct's field offset), but I can imagine this being useful for constant loads too.

@carlsmith
Copy link

Why do you think WAT will never be satisfactory for writing code?

It's verbose, requires a lot of redundant information, and S-expressions. People have already published WAT supersets. I've been toying around with a completely new Wasm assembly language, and it's a fun project, because there are so many things you could potentially improve. WAT is also part of a standard, so alt-wat languages will always have a lot more freedom to take risks (as languages like CoffeeScript and Sass have). WAT will not change radically now, and it is definitely possible to create something many people will prefer.

@carlsmith
Copy link

The only point I really wanted to make is already covered by #1348. When I wrote my original comment, I still thought handwritten WAT was beyond scope.

@gwenya
Copy link
Author

gwenya commented Sep 4, 2020

@binji those are both good ideas, I'll edit them in.

@carlsmith I don't find WAT particularly verbose in comparison with other assembly languages. Sure you could shorten the opcode mnemonics, but I feel like that would make it less satisfactory because it would be harder to remember them both when writing and reading. S-expressions are not at all necessary, you can write the instructions sequentially if you don't like them. There also aren't that many things that it is missing compared to other assembly languages, this issue is an effort to get one of these things added.
Regarding WAT being a standard and therefore hard to change, the annotations proposal will make it possible to introduce all kinds of features that an assembler can implement independently of the standard.

@carlsmith
Copy link

carlsmith commented Sep 4, 2020

@Sammax - Sorry, I haven't seen the annotations proposal either. I've always found it unusually difficult to find information about Wasm and WAT development. Could you link to the proposal, please? Thank you.

The instructions are fine in WAT, generally, though they can be improved in little ways. It's the rest of the module that is unpleasant to read and write, and does require S-expressions all over the place. The whole thing is an S-expression. It's all subjective, and I don't want to derail this issue debating language features, but I just personally want to write Wasm modules by hand, because it's fun, clean and fast, but not in WAT, because I hate the syntax.

@gwenya
Copy link
Author

gwenya commented Sep 4, 2020

@carlsmith The annotations proposal is here: https://github.com/WebAssembly/annotations/blob/master/proposals/annotations/Overview.md. It is intended as a way to encode custom sections in the text format, but it only defines the syntax of annotations, not the semantics, which is left to the tools like wat2wasm. This opens the way to use annotations for pretty much anything, e.g. pseudo instructions or macros.

There is a list of all proposals here: https://github.com/WebAssembly/proposals

@carlsmith
Copy link

Thank you, @Sammax. Much appreciated.

@7ombie
Copy link

7ombie commented Apr 15, 2021

What happened to this proposal? It would definitely be useful to (at least) have a way of referencing the absolute implied memory indices as offsets to load and store instructions.

It should be mentioned that this feature, as proposed, overloads identifiers, which are normally used to reference indices in indexspaces. This extends that to include addresses, and possibly table elements. For that reason, it may (possibly) make sense to introduce a different kind of identifier, that begins with something other than a dollar character.

It's also worth considering that the text format currently provides no way to assign an explicit number (or string) to an identifier that is accessible at compile time. We can access registers at runtime, but cannot say something like #samples: 0x100, then do stuff like i32.load offset=#samples, and have the compiler swap the #samples token for a 0x100 token.

Just to offer a starting point for an alternative approach, we could allow identifiers that begin with a hash and end with a colon (like #samples:) to be assignments, and the same token without the colon (#samples) to be a reference. The text format would then provide a simple grammar for assigning an explicit number or string literal to a name:

(#PI: 3.141)

Thereafter, the compiler would just blindly swap every instance of the first token (minus the colon) for the second one.

The text format could then also provide any number of ways to assign implicit values to identifiers (as suggested originally):

(module
  (data (offset (i32.const 123)) "foo" #barStr: "bar")
  (func $getBarAddr (result i32)
    (i32.const #barStr)))
(data (offset (i32.const 100))
 "foobar"
  #addr1: (i8 0x00 0x10 0x20 0x30)
  ...
  (i32 #addr1)
  ...
)
(elem (i32.const 0) #name: $func #otherName: $otherFunc $yetAnotherFunc)

I'm not super keen on the hash character. A bang may look nicer (like !name: $funcref and offset=!name). I really just wanted to demonstrate that introducing a different kind of identifier simplifies the syntax (i32.const #barStr instead of i32.const data-addr=$barStr), disambiguates implicit assignments that use a regular identifier (#name: $funcref instead of ($name $funcref)), and permits the assignment of explicit values.

Having the colon be significant, while requiring that it's part of the token, is a bit REBOL, but the text format already does this with stuff like offset=1 (which is a single token). The longest-match rule makes it hard to avoid sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants