Skip to content

Conversation

@zherczeg
Copy link
Contributor

@zherczeg zherczeg commented Mar 11, 2025

This large patch adds function references support for those parts of the wabt library, which already present in the code. The current code is designed for an older (outdated) variant of the function references proposal.

Key changes:

  • Var variables can optionally store type data. This way there is no need for a Var/Type field pair in many cases.
  • CallRef now needs to explicitly provide its reference type (no auto detection)
  • Tracking local reference initialization (non nullable refs must be set before using them)
  • Supporting type equality comparisons in the validator and in the interpreter
  • Adds a code for reading/writing 33 bit integers (leb format)
  • Remove EndTypeSection in shared validator, types can be validated earlier
  • Improve named reference resolving in the text parser

@zherczeg
Copy link
Contributor Author

@sbc100 @keithw since I don't know enough about the project, these are just some random changes in the parser.

The aim is parsing https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast

It is doing something (parsing correctly is surely an exaggeration) with call_ref $ii and ref null $ii. The next item is ref.null $ii. It looks like it will need API changes, since a token type will not be enough. Would adding an index to it be a good idea?

Could you check that these changes makes sense? There is a Type::Reference enum in the code, but function reference introduces two other enums, and does not use Reference. This is strange to me.

Thank you for the help.

@zherczeg
Copy link
Contributor Author

This patch goes to a new direction. The 32 bit type filed of Var is split into two 16 bit fields. The second can be used to store WebAssembly types (negative 7 bit numbers). This way Var can be used in a more generic way.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from c804a98 to c40c098 Compare March 13, 2025 12:53
@zherczeg
Copy link
Contributor Author

There is something I don't understand. There is a Reference = -0x15 in type.h. However, in
https://github.com/WebAssembly/function-references/blob/main/proposals/function-references/Overview.md
(ref ht) is defined as -0x1c and (ref null ht) is -0x1d. Is the old code obsolete?

The key change of the patch is that Var can also behave like Type, and transformation between the two is possible after names are resolved.

Btw, may I ask how actively maintained the wabt project?

@sbc100
Copy link
Member

sbc100 commented Mar 13, 2025

I believe ht refers to Heap Type which would not be needed for function references. I believe heap types refer to things like structs and arrays in the GC proposal.

@zherczeg
Copy link
Contributor Author

That is possible. Unfortunately not everything is described clearly in the text. For example, ref.null ht: [] -> [(ref null ht)] Which encoding is used for ht here?

https://github.com/WebAssembly/function-references/blob/main/proposals/function-references/Overview.md

@zherczeg
Copy link
Contributor Author

I have tried to track down the v8 implementation for ref.null.

https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L386

It calls read_heap_type. It is in the same file.

https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L217

Starts with:
decoder->read_i33v(pc, "heap type");

So it looks like it uses the s33 decoding which the specification mentioned.

@zherczeg zherczeg force-pushed the function_ref branch 5 times, most recently from c94e5a1 to a959acb Compare March 14, 2025 16:37
@zherczeg
Copy link
Contributor Author

I have some good news! This test is fully working now (including the interpreter):
https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast

Only 9 fails remained, they all look like CallRef syntax change related, so some tests need to be updated.

The patch is quite large. There are some parts which are questionable. I will comment about these parts later to help review.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 78596d6 to b79aead Compare March 15, 2025 06:49
};

struct Var {
// Var can represent variables or types.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major internal change. It does not increase the size of Var. I think it is better than constructing a Var/Type pair.

size_t ReadU32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadU64Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
size_t ReadS32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadS33Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec adds this new type, the output is uint64_t. It could be merged with ReadS32Leb128 by adding a bool* argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is S33 used somewhere in the binary format now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And they use it even more places in gc :(

https://webassembly.github.io/function-references/core/binary/types.html#heap-types
a 33 bit integer can hold a 32 bit unsigned index or a type value. Its maximum (leb) size is the same as 32 bit integers.

ref.null uses this for example.
https://webassembly.github.io/function-references/core/syntax/instructions.html#syntax-instr-ref

Result BinaryReaderLogging::name(Type type) { \
LOGF(#name "(%s)\n", type.GetName().c_str()); \
return reader_->name(type); \
#define DEFINE_TYPE(name) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this might be reverted. I will check.

src/leb128.cc Outdated
}
}

size_t ReadS33Leb128(const uint8_t* p,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These readers don't check that the minimum amount of bytes is used for encoding. This is important for utf, not sure for leb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little more what you mean here?

For sure we allow encoding that are over-long, i.e. zero-padded. This is used a lot in the llvm wasm writer, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know that. Some encodings, such as utf8 expects a minimum number of byte encodings, so a 7 bit character cannot be encoded more than a single byte, even if the type bits are correct for longer encodings. If padding for leb is allowed then it is perfectly ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid duplicating code here and instead just use ReadS64Leb128 here (with a bounds check after reading it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is possible. We could add a default false boolean flag to the ReadS64Leb128 as well.

Anyway, what is the reason for the lot of code duplications (and manual loop unrollings) in this file? Performance? For consistency, I also added another duplication, but this part could be improved (reducing code size) in many ways. It can be done in another patch as well, before or after this patch.

What is your suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, to keep this patch focused how about just removing the LEB changes from this PR?

I think its reasonable for now to just read S33 values using ReadS64Leb128. Apparently this is what binaryen does. We can come back and try to cleanup this stuff and remove the duplication later perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed the S33 parsing. We should do something with it later. Please, let me know your opinion, and I will do it.

}

if (func) {
if (!func->local_type_list.empty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a bugfix. Named references has been never resolved for local types. Maybe we could do it on the compressed list, but that can be done in another patch.

@zherczeg zherczeg force-pushed the function_ref branch 4 times, most recently from 6587cf7 to 0f7353f Compare March 16, 2025 04:16
@@ -0,0 +1,19 @@
;;; TOOL: run-interp-spec
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that call_ref.wast is already in the repository, and it is passing now. There are other tests in the directory (e.g. binary.wast) which is also passing now, they could be updated later as well.

@zherczeg zherczeg marked this pull request as ready for review March 16, 2025 04:37
@zherczeg
Copy link
Contributor Author

I think the patch is ready. There are lots of follow-up work ahead.

@zherczeg zherczeg force-pushed the function_ref branch 3 times, most recently from 58411a5 to bf0400f Compare March 18, 2025 05:06
@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 9cb09d6 to 293236b Compare June 3, 2025 09:58
@zherczeg
Copy link
Contributor Author

zherczeg commented Jun 3, 2025

I have found a bug in RecursiveMatch and fixed it. It seems there will be no other comments to this patch, @sbc100 you are my only hope.

@sbc100
Copy link
Member

sbc100 commented Jun 3, 2025

I have found a bug in RecursiveMatch and fixed it. It seems there will be no other comments to this patch, @sbc100 you are my only hope.

I think this change is mostly looking good. I'm still trying to find some time to really understand what is going on with the Var / opt_type thing as it seems a little odd to me still.

@sbc100
Copy link
Member

sbc100 commented Jun 19, 2025

I'm on vacation for the next couple of week, but I will make a concerted effort to review this change when I get back in July.

@zherczeg
Copy link
Contributor Author

Thank you. The GC support will also likely completed by that time. Only 5 instructions remained, all type cast related.

@zherczeg
Copy link
Contributor Author

It looks like WebAssembly 3.0 is released, and GC is part of it: https://webassembly.org/news/2025-09-17-wasm-3.0/
May I ask what is the plans for these patches?

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I finally go around looking at this PR again. Sorry for the long delay.

I think we are pretty much ready to land this. Just a couple more nits.

Should we update the README to mention that we now support the function references proposal?


struct LocalReferenceMap {
Type type;
Index bit_index;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this is an index into local_ref_is_set_? Perhaps it should use be size_t rather than Index?

Perhaps it should be called is_set_index? Or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have added a comment which should clear its use.

void SharedValidator::IgnoreLocalRefs() {
if (!local_ref_is_set_.empty()) {
std::fill(local_ref_is_set_.begin(), local_ref_is_set_.end(), true);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just remove the if check here?

return AppendExpr(std::make_unique<CallRefExpr>());
Result BinaryReaderIR::OnCallRefExpr(Type sig_type) {
auto expr = std::make_unique<CallRefExpr>();
expr->sig_type = Var(sig_type, GetLocation());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the contructor for CallRefExpr tage this sig_type as an argument? (Like the RefNullExpr below? Or is there some reason they should be different?

Copy link
Contributor Author

@zherczeg zherczeg Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized (I made the code months ago) I just followed the other Call structs.

https://github.com/WebAssembly/wabt/blob/main/include/wabt/ir.h#L725
https://github.com/WebAssembly/wabt/blob/main/include/wabt/ir.h#L694

The wast parser first creates the object, then parses its arguments. Shall I change this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best pattern is but we should try to reach consistency eventually. Happy to land this now and cleanup later.

)
(;; STDERR ;;;
out/test/parse/bad-references.txt:6:31: error: reference type out of range: 2 (max: 2)
out/test/parse/bad-references.txt:6:4: error: reference 2 is out of range in params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok that we will the mac: xxx in these error messages? They seem maybe useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added it back. However, I am not sure it is that helpful in this case. The original messages were produced by the parser, now the shared validator produces them. The difference is, that implicit function declarations are now part of the max, since the shared validator has no idea that they are implicit declarations. Furthermore, just because you pass a number less than max, it is not necessarily valid, which can also be confusing for some people. An interesting side effect is that max also grows with every new type declaration, so it is a bit more precise.

@@ -0,0 +1,78 @@
;;; TOOL: run-interp-spec
;;; STDIN_FILE: third_party/testsuite/proposals/function-references/br_table.wast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the update-spec-tests.py script be updated to remove the exclusion of function-references?

@zherczeg zherczeg changed the title Add support for function references proposal Add support for (ref ...) / (ref null ...) constructs Nov 7, 2025
@zherczeg
Copy link
Contributor Author

zherczeg commented Nov 7, 2025

The title was misleading, I am sorry. I thought I can do this in one patch, but the full support is divided into three patches. I have updated the title.

Needed by function references proposal
Support named references for globals, locals, tables, elems
Support named references for call_ref, ref_null
Extend Var variables with an optional type field
@zherczeg
Copy link
Contributor Author

zherczeg commented Nov 7, 2025

I have rebased and updated the patch. Please let me know if you need more changes.

@sbc100
Copy link
Member

sbc100 commented Nov 7, 2025

lgtm!

(What about update-spec-tests.py?)

@zherczeg
Copy link
Contributor Author

zherczeg commented Nov 7, 2025

(What about update-spec-tests.py?)

I think we should update it in the third patch when the feature is complete.

@sbc100 sbc100 enabled auto-merge (squash) November 7, 2025 18:25
@sbc100 sbc100 disabled auto-merge November 7, 2025 19:57
@sbc100 sbc100 merged commit ee87962 into WebAssembly:main Nov 7, 2025
17 checks passed
@sbc100
Copy link
Member

sbc100 commented Nov 7, 2025

Thank you Zoltan, and again sorry for the absurd review time.

@zherczeg zherczeg deleted the function_ref branch November 8, 2025 03:38
@zherczeg
Copy link
Contributor Author

zherczeg commented Nov 8, 2025

No problem. These changes are rather complex which means higher risk from project perspective, and needs more time to review. There are two more patches for function references: #2567 implements new opcodes (I have rebased it) and #2593 implements table initializers. I will add README and update-spec-tests.py changes to the last one.
Could you review them as well? Thank you!

sbc100 pushed a commit that referenced this pull request Nov 9, 2025
…ions (#2567)

Followup to #2562

Another 1000 lines addition. Depends on the previous work. Currently
implements 3 opcodes, but I might add the remaining opcode
(return_call_ref).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants