Inference without ONNX / usage of WONNX as backend for LLMs

**Is your feature request related to a problem? Please describe.**
I'm one of the maintainers of the [llm](https://github.com/rustformers/llm) project, and we're looking for a robust, cross-platform GPU inferencing solutions for our LLM models. We currently have computation graphs for GGML, but are planning on installing some kind of abstraction for use with other backends.

I'm investigating the use of `wonnx` as a potential backend, but it is (understandably!) coupled to ONNX. I was wondering if it would be possible to specify a computation graph directly for compilation/inference without going through ONNX.

**Describe the solution you'd like**
A builder API for computation graphs, or something similar, so that a `wonnx::Session` could be created without the use of ONNX.

**Describe alternatives you've considered**
I've considered constructing a `wonnx::onnx::ModelProto` at runtime, but the ONNX format contains a *lot* of things we don't need or don't have. 

It's designed for self-contained models; however, we are loading weights from arbitrary locations and supplying our own computation graph, making it difficult for us to synthesize a complete ONNX model.

**Additional context**
There's no particular hurry on this. We'd love to have GPU inference as soon as possible - especially truly cross-platform, non-CUDA (!) inference - but I assume this would be a large body of work.

I'm also not sure what operations would need to be implemented for our use case, but we would file PRs as required to implement any missing operations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference without ONNX / usage of WONNX as backend for LLMs #169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference without ONNX / usage of WONNX as backend for LLMs #169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions