Description
The code generated by JNIgen can be thousands of lines long, resulting in millions of tokens, which is far too much for an LLM to handle effectively due to the context limit.
The LLM only needs the public API surface (enums, classes, methods, and fields) relevant to a given snippet.
Also, to a given snippet, it doesn't need to use all the public API; it only requires a subset that is relevant to the given snippet.
We can analyze the code snippet and exclude all public APIs that aren’t referenced in it. However, we need to be careful about how aggressively we filter.
If we make it too strict, we might miss important information. For example, if the snippet contains a function call like foo(x, y)
, and y
is of type Bar
, but Bar
was excluded from the API, the LLM won't be able to understand or initialize y
to solve the error because it doesn't know what Bar
is.
So we need a balance, include only what's relevant, but also keep enough context (like related types) to preserve meaning.
Or just give the LLM all the public API.
The context will be given to the LLM as
- full class declaration "with extended class and implemented ones"
- Constructors signature
- Methods signature
- Fields
- Getters
- Setters