-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Langchain compatability #253
Comments
This is actually pretty easy to implement as is with the HF inference server since Langchain supports wrapping custom models (example below is taken nearly verbatim from the Langchain docs). You can do something like this, just adjust the host name / ports to your liking: import os
from typing import Any, List, Mapping, Optional
from text_generation import Client
from langchain.llms.base import LLM
LLM_HOST = os.environ.get('LLM_HOST', '0.0.0.0')
LLM_PORT = os.environ.get('LLM_PORT', 6018)
client = Client(f"http://{LLM_HOST}:{LLM_PORT}")
class CustomLLM(LLM):
name: str
temperature: float = 0.8
max_new_tokens: int = 100
stream: bool = False
@property
def _llm_type(self) -> str:
return "custom"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
if not self.stream:
reply = client.generate(prompt, max_new_tokens=self.max_new_tokens).generated_text
# print(reply)
return reply
else:
raise NotImplementedError
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"name": self.name}
if __name__ == "__main__":
query = 'Question: How old is Barack Obama? Answer:'
llm = CustomLLM(name='local_llm')
resp = llm(query)
print(resp) I haven't been able to get the streaming part to work yet, but I think Langchain is working on some updates on their end that should make that work soon. |
langchain team has already built this integration: https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_textgen_inference.html |
Yes, and we also have the streaming feature working in that implementation, contrary to my original post. Unfortunately, it doesn't support embedding; it's inference only. This fork of the repo aims to support embedding, but to my knowledge it isn't working yet. |
@dcbark01 is there an API/repo for the embedding similar to "huggingface/text-generation-inference" ? i didn't found it |
@ArnaudHureaux, unfortunately the answer right now is (to my knowledge) 'no', there isn't a similar option available for embeddings. This is a major hole in the LLM ecosystem IMO, so it is something I am actively working on fixing. In fact, I already have a solution implemented, I'm just working with my current employer at the moment to open-source it. We're an academic research outfit, so I expect we'll get the approval to do so, but it may take a couple weeks. I'll be sure to comment back on this issue if/when we get it approved. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Similar to the work performed langchain-llm-api I would like to see the ability to use this natively within langchain. Are there any plans to do so such that the models could be presented back as
generate
andembed
endpoints for use?The text was updated successfully, but these errors were encountered: