Description
openedon Nov 6, 2024
Even with a normalized API, underlying LLM's capabilities vary from one model to another, and often from one provider to another.
For that reason, it would be very useful to define a set of "capabilities", and having a way to evaluate which capabilities a given LLM supports.
Identified capabilities
Right now, it would mostly be about two things:
native function calling
If we do have a simulated function calling
option for LLMs that don't have native support for function calling, we're not able to automatically set that option, as we don't know if the LLM we're using supports native FC or not.
Being able to know if the model supports it would allow to automatically fallback to SFC if we know the model does not have native support. (We could then introduce that automatic
value for the simulated function calling option)
context window length
RAG tasks often retrieve large amounts of data, either by volume of documents, or length of those documents. Without knowing the LLM's max context length, it's difficult for those RAG tasks to properly process that data.
For example, if a RAG task retrieves a 15k token document, it might need to truncate / summarize / reduce it before sending it back to the LLM, as there could be a risk of exceeding the context window. On the other hand, if we do know that the LLM has way larger context window (say 512k), we can send it raw to make sure all the info are present.
Knowing about the LLM's context window is somewhat critical for RAG workflows to adapt.
How to do it
As configuration on the connectors
We could handle that via configuration, asking the user to specify this when creating the connector
Note that this would be more of a stopgap than a real solution, given it moves the burden to the end user, and would also be error prone due to that fact.
Retrieving the information from the LLM provider
Ideally, we would be able to retrieve (or deduce) this information from the provider.
For that, we would need to check if there is a way to retrieve that info from each of our providers, but this might be complicated given that the openAI connector allows virtually any openAI compatible LLM, and there is, AFAIK, no openAI API to retrieve that information.
This might be even more complicated later, when we get to support the ES inference API / inference connector, as we would have an additional abstraction layer obstructing those informations
Just check?
We could think of just "checking" for capabilities, and store the info somewhere.
For FC it could be relatively easy, we just try to call with FC, and if we got an error, we know the provider does not support it.
For context length it's more tricky, we can send a very large input to make sure to exceed any LLM's context, but even if the info about the max context length is usually exposed in context length errors, it might be tricky to extract it given the message will vary.
Not sure if that approach would be viable for other capabilities we might identify later, too.