Problem In LLM Caching Current Implementation (Not work individually as set in Global) #17176
Replies: 3 comments 3 replies
-
We'd probably want a caching solution that wraps existing primitives rather than being passed to existing primitives. We don't want the primitives code to know about the fact that there's a caching layer. Otherwise that will lead to design issues down the road (much like the current global cache is problematic). I'd prefer to see a generic caching layer built on top of any runnable object or if it needs to be specialized to a chat model, the caching layer can inherit from BaseChatModel and accept as an instance:
|
Beta Was this translation helpful? Give feedback.
-
I see you want a generic abstract solution for Cache. But I am not sure you are referring to my solution mention in point 1.
This is the code from langchain. Please see the comment in the code to better understanding the my solution(point 1). BUG: Also I have found that current global cache implementation not work with stream response. @eyurtsev I see you are a core member of Langchain. Could you please share your planning on this? Will these feature add in later? When? It would be nice if you create a Issue from this conversation (only if you have a plan to rewrite the cache strategy) so that other developer will know about this. This is important as LLM takes long time to response. And you also agreed that current cache implementation is problematic and not developer friendly. |
Beta Was this translation helpful? Give feedback.
-
Added an issue here: #17242 Given that the change you propose is very minimal and leverages existing code, I think it makes sense to extend accept values for the cache. |
Beta Was this translation helpful? Give feedback.
-
Checked
Feature request
I want to implement
RedisSemanticCache
for LLM call. But I see the current implementation set/get the cache instance in/from a global variable. So If I want to make the cache for a call specific it's not possible. I have an API where embedding related information(api_key, model etc) pass in request body from user. So If I want to use the embedding information in the cache it's not possible as it has been set in globally(more like singletone).cache
property of theBaseChatModel
. Currently this field is bool. Instead of bool we can make it nullableBaseCache
. I mean when we create a LLM(ChatOpenAI) we will pass the cache instance.RedisSemanticCache
indexing happen on thellm_string
. There should be a way where developer can pass the indexing value based on the user requirement.These problems persist in all the cache strategy provided by the langchain. @baskaryan @hwchase17 do you have any plan on this?
Motivation
I have described the problem above. There is not enough control to the developer for the current cache strategy. Developer can't even override anything as the Cache set in global instance.
Proposal (If applicable)
Mentioned Above
Beta Was this translation helpful? Give feedback.
All reactions