Skip to content

make module optionally accept host interfaces as generic parameters#2

Open
NyaaaWhatsUpDoc wants to merge 6 commits intoncruces:mainfrom
NyaaaWhatsUpDoc:use-generics-for-host-interfaces
Open

make module optionally accept host interfaces as generic parameters#2
NyaaaWhatsUpDoc wants to merge 6 commits intoncruces:mainfrom
NyaaaWhatsUpDoc:use-generics-for-host-interfaces

Conversation

@NyaaaWhatsUpDoc
Copy link
Copy Markdown

@NyaaaWhatsUpDoc NyaaaWhatsUpDoc commented Mar 11, 2026

mostly just parking this PR here, may pick it up again if there's want for generic host interfaces.

it generates mostly working code, just the host interface definitions need to be moved to the top of the file, and the module generic parameter definition needs to use a parameter name different to the type of the interface (e.g. X instead of Xenv).

@ncruces ncruces force-pushed the main branch 12 times, most recently from bba7e34 to f866c02 Compare March 16, 2026 12:49
@NyaaaWhatsUpDoc NyaaaWhatsUpDoc force-pushed the use-generics-for-host-interfaces branch from 243a587 to 4d3b02b Compare March 18, 2026 22:15
@NyaaaWhatsUpDoc NyaaaWhatsUpDoc changed the title WIP: make module accepted host interfaces generic parameters make module optionally accept host interfaces as generic parameters Mar 18, 2026
@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

this should be good to go now @ncruces, though it's hidden behind a (default=false) CLI flag so until (if at all) there is interest to update go-sqlite3-wasm to build with generics, it can just be an optional. when i get around to attempting to build ffmpeg i'll probably be making using of them myself

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 18, 2026

I haven't really looked at this yet, sorry. And, probably due to that, I'm not sure I understand the point.

In fact I feel like I've moved in the opposite direction (more interfaces, and anonymous types), which I should explain.

Relying on structural typing allows modules to be compiled/translated "separately" (with no knowledge of each other) and linked together at "runtime".

This not only faithfully implements a part of the spec (and more spec tests run because of it), it may also allow me to take a monster like SQLite and break it down to pieces (multiple Wasm modules) that you can then link together. So FTS5, and R*tree could be split off to separate packages and loaded as needed (like the extensions implemented in Go).

I also tried to implement exceptions, just to find out that, after laying out all the ground work around panic/defer/recover, the final version of exceptions depends on typed references (which I thought was a GC thing).

And that type system stuff (enough complicated that I'm afraid to touch), I suspect, might interfere with what you're doing here too.

To be frank, I raced here to implement the bits of Wasm I knew from wazero/SQLite, and I did memory64 on request. And I got excited with how easy it all was. I explored the dynamic side of things to break down SQLite because it's too big to compile.

And because I love compilers I drafted ideas in my head about exceptions with IIFEs, and tail calls with trampolines. But looking ahead, I'm starting to feel Warm 3.0 is a huge complexity jump, that might enable nice wrappers around C++ and maybe Rust, but I'm not sure at what cost.

I had a clear vision of what I wanted to achieve here. But I have zero intuition for where do I want to go from here, if anywhere. So this may take me a while.

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 19, 2026

when i get around to attempting to build ffmpeg i'll probably be making using of them myself

SQLite was a 1.34MB Wasm compiled for speed. I compile it for size now, which makes it 852KB of Wasm that blows up into 12MB of Go, which takes 10sec to compile on a decent machine.

I shudder to think what 20MB of ffmpreg will turn into, as a monolith.

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

I shudder to think what 20MB of ffmpreg will turn into, as a monolith.

180MB of Go source 😎 yet to see what the compiled size will be though.

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 19, 2026

I wonder if dynamic linking is feasible? And if, for your usage, if it makes sense?

I know most people don't use FTS so if they could only include it in some connections by saying:

driver.Open("file:foo.db", fts5.Register)

That would be an improvement. And even fewer use R*tree, there are now competing "vector" extensions, etc. Me being able to provide such things as separate packages, and teach people how to build their own, would be a net win.

But I'm not sure you can do away with codecs, so that may be unhelpful.

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

But I'm not sure you can do away with codecs, so that may be unhelpful.

it probably would be possible if it is with sqlite3, though I'm not sure it would benefit us as we already only compile what we need. if other users come to depend on the library too, then that might be a different story.

we already do a bit of a hack to combine ffmpeg and ffprobe binaries into a single binary that switches on argv0, which gets around our need for dynamic linking as-is. it probably could be faster if i wrapped the ffmpeg lib C API instead of calling into the CLI, but that's a monumental task on top of other work i need to do.

as it stands i could get a stripped back (missing our AV1 codec) version of ffmpeg / ffprobe building with the current version of wasm2go, and to get it working i need to continue working on my wasip1 host module. unfortunately AV1 requires setjmp / longjmp support which i think is what the EH you were looking into would cover.

ultimately putting ffmpeg through wasm2go is just me experimenting with things at the moment, the output source size or other unforeseen problems may end up blocking us putting it to use in gotosocial.

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

i compiled the single module containing the generated ffmpeg source, and it alone comes out to 380MB :')

i think i'll probably continue my endeavours here to see how it compares to wazero, but definitely not something i can provide as an option for GoToSocial...

@pgaskin
Copy link
Copy Markdown

pgaskin commented Mar 19, 2026

it alone comes out to 380MB

On the other hand, the git deltas between wasm2go-generated sources seem somewhat more efficient than between the source wasm blobs based on a few samples of my binaries.

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 19, 2026

I think they meant the output binary is 380MB? Which I find odd, if the source is 180MB.


That said, an SQLite test binary is 17MB (and 12MB ldflags="-s -w") with 12MB of Go source, so maybe that's the ballpark?

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

NyaaaWhatsUpDoc commented Mar 19, 2026

I think they meant the output binary is 380MB? Which I find odd, if the source is 180MB.

that was just running go build ./{libraryPackage}

building it into a binary with a main() with -ldflags='-s -w' took about an hour on a Ryzen 7840u (no slouch!), but the resulting binary is 123MB in size

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 19, 2026

I'll take ideas on improving the generated source in ways that help the Go compiler (but maintain correctness and don't over complicate the translator).

TBH, I'm not incredibly concerned with the source size, although 180MB seems excessive.

Even the 12MB from the 9.4MB sqlite.c amalgamation is excessive, since I'm dropping a bunch of features: FTS3/4, RBU, the session extension, the Unix, Windows and Wasm VFSes, etc.

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

I'll take ideas on improving the generated source in ways that help the Go compiler (but maintain correctness and don't over complicate the translator).

tbh i'll need to do some profiling during compilation to be sure, not something i've done before! if i'm lucky there will be an improvement possible that tackles both compilation complexity and source size

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 19, 2026

There's at least one.

The problem is making sure it doesn't break anything (i.e. coming up with a compelling story for how it can be applied).

If you look at pushCond and popCond. These allow me to remember that the last thing pushed to the stack was a condition (a boolean converted to an integer). Then if this is used in an if statement, I scrap the code to convert it to an integer (which would require me to compare it to zero), and use the condition directly. Problem: this keeps the boolean unevaluated. So I need to ensure the condition is immediately used once and only once.

I could do the same for expressions. If an expression is used immediately once and only once, I can avoid pushing it to the stack to just pop it right away. I can speed up building SQLite by around 25%.

But if I use this recklessly, things break. This obviously can't be used for "tee," for example. So where to draw the line?

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 20, 2026

@NyaaaWhatsUpDoc @pgaskin see #5.

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

NyaaaWhatsUpDoc commented Mar 20, 2026

@NyaaaWhatsUpDoc @pgaskin see #5.

oh interesting! i started working on something too, though taking a different tack. i've started writing a pass that runs over all the function bodies after they've been defined, simplifying variable usage by replacing variables that get defined but never written-to (obviously being aware of shadowing in different scopes within the same function).

as running a cpuprofile during ffmpeg compile revealed it spent most of the time in SSA and GC functions. so my hunch was that simplifying variable usage (which it seems your PR will manage too) would reduce the impact of calculating SSA and more generally the garbage collector.

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 20, 2026

Yes. My initial version assumed "SSA will fix the too many temporaries" but I'm clearly putting a huge toll on SSA.

My version doesn't reduce sqlite.go size by a lot (in MB) but does in LoC and makes compiling it about 30% faster.

The problem is really convincing myself that it's correct. That everything visible still executes in the same order. Because a lot is visible.

@NyaaaWhatsUpDoc
Copy link
Copy Markdown
Author

The problem is really convincing myself that it's correct. That everything visible still executes in the same order. Because a lot is visible.

my hunch (though i may be wrong, and ultimately the choice is yours here) is that applying code minimization passes after generating the full AST might be the easier way to ensure no changes to logic. later passes could also easily be bypassed with a CLI flag for applying tests both with / without.

@pgaskin
Copy link
Copy Markdown

pgaskin commented Mar 20, 2026

see #5

Will take a look at it later today.

my hunch (though i may be wrong, and ultimately the choice is yours here)

That was also my hunch (or at least generating some form of IR, then optimizing on that with passes of provably correct optimizations, before finally generating the Go code).

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 20, 2026

I'd rather not build an entire optimizing compiler. I'd be creating an IR, then doing SSA… 🙃

I know if I generate better code it's worth the effort as we translate once, and many people would compile it, but I'm also not as smart as the Go team, and the wazero folks had a year and two guys to build the optimizing compiler.

So, that said, I'm not entirely sure this optimization would be possible (feasible/reasonable?) to do on the AST.

Figuring out if a temporary is used before anything that allows you to observe order seems harder to do on a tree than on a stack machine.

This version is more general than what I had before. I kept conditions unevaluated for branching (this was easy because there are very few branch instructions). I also had an half broken version that kept the top of the stack expression around. This is more general than both and seems correct.

@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 21, 2026

I merged #5.

Coming back to this PR, I'm not sure what this is buying us? Shouldn't at least _env be E, and the argument too? But even then, what's the goal here? (and I'm sorry for loosing track).

type Module[E Xenv] struct {
	t0       []any
	elements [][]any
	_env     Xenv
}

func New[E Xenv](v0 Xenv) *Module[E] {
	m := &Module[E]{}
	m._env = v0
	m.t0 = make([]any, 32)
	m.elements = [][]any{{m.f1}, {m.f0}}
	copy(m.t0[16:], m.elements[0])
	copy(m.t0[17:], m.elements[1])
	if i, ok := any(v0).(interface {
		Init(any)
	}); ok {
		i.Init(m)
	}
	return m
}

type Xenv = interface {
	Xjstimes3(v0 int32) int32
}

func (m *Module[E]) f0(v0 int32) int32 {
	return m._env.Xjstimes3(v0)
}

@NyaaaWhatsUpDoc NyaaaWhatsUpDoc force-pushed the use-generics-for-host-interfaces branch from f574b3a to f69bc58 Compare March 21, 2026 23:21
@ncruces
Copy link
Copy Markdown
Owner

ncruces commented Mar 23, 2026

c63a558 adds a flag to disable all optimization passes, including #5.

If you find a difference in behavior, it's a likely wasm2go bug.

@ncruces ncruces force-pushed the main branch 7 times, most recently from 67abda0 to 50b8a95 Compare March 29, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants