Skip to content
This repository was archived by the owner on May 7, 2025. It is now read-only.
This repository was archived by the owner on May 7, 2025. It is now read-only.

Inference without ONNX / usage of WONNX as backend for LLMs #169

Open
@philpax

Description

@philpax

Is your feature request related to a problem? Please describe.
I'm one of the maintainers of the llm project, and we're looking for a robust, cross-platform GPU inferencing solutions for our LLM models. We currently have computation graphs for GGML, but are planning on installing some kind of abstraction for use with other backends.

I'm investigating the use of wonnx as a potential backend, but it is (understandably!) coupled to ONNX. I was wondering if it would be possible to specify a computation graph directly for compilation/inference without going through ONNX.

Describe the solution you'd like
A builder API for computation graphs, or something similar, so that a wonnx::Session could be created without the use of ONNX.

Describe alternatives you've considered
I've considered constructing a wonnx::onnx::ModelProto at runtime, but the ONNX format contains a lot of things we don't need or don't have.

It's designed for self-contained models; however, we are loading weights from arbitrary locations and supplying our own computation graph, making it difficult for us to synthesize a complete ONNX model.

Additional context
There's no particular hurry on this. We'd love to have GPU inference as soon as possible - especially truly cross-platform, non-CUDA (!) inference - but I assume this would be a large body of work.

I'm also not sure what operations would need to be implemented for our use case, but we would file PRs as required to implement any missing operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions