Skip to content

[native_doc_dartifier] Code context for the LLM #2309

Open
@marshelino-maged

Description

@marshelino-maged

The code generated by JNIgen can be thousands of lines long, resulting in millions of tokens, which is far too much for an LLM to handle effectively due to the context limit.

The LLM only needs the public API surface (enums, classes, methods, and fields) relevant to a given snippet.

Also, to a given snippet, it doesn't need to use all the public API; it only requires a subset that is relevant to the given snippet.

We can analyze the code snippet and exclude all public APIs that aren’t referenced in it. However, we need to be careful about how aggressively we filter.

If we make it too strict, we might miss important information. For example, if the snippet contains a function call like foo(x, y), and y is of type Bar, but Bar was excluded from the API, the LLM won't be able to understand or initialize y to solve the error because it doesn't know what Bar is.

So we need a balance, include only what's relevant, but also keep enough context (like related types) to preserve meaning.

Or just give the LLM all the public API.

The context will be given to the LLM as

- full class declaration "with extended class and implemented ones"
    - Constructors signature
    - Methods signature
    - Fields 
    - Getters
    - Setters

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions