Skip to content

[FEATURE] Static Inference Support for RawC and GGML #228

Open
@emrecakmakyurdu

Description

@emrecakmakyurdu

Feature Request

Describe the Feature

Static inference in RawC and GGML backends is not supported. Currently, these backends rely on dynamic execution even when constant inputs are provided. Static inference will allow for pre-computation of operations at compile time, thereby optimizing performance.

Motivation

This feature will eliminate the need for dynamic execution, improve efficiency and reduce runtime overhead when constant inputs are supplied.

Proposed Solution

1. RawC Backend

  • Develop Python wrapper functions to execute supported operations directly on the RawC backend when static inputs are supplied.

2. GGML Backend

  • Utilize RawC backend operations as the basis for computations.
  • Convert GGML arrays to C arrays before passing them to functions, and convert the results back to GGML arrays.
  • In GGML code generation, bypass tensor creation and graph marking for statically inferred keys by directly assigning these keys to the output.

Alternatives Considered

An alternative approach for the GGML backend would involve creating a separate dynamic library to manage the GGML flow for tensor operations. However, this method would require context and memory buffer allocation for each static inference, potentially offsetting the performance benefits.

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions