Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_documents function call results with "AttributeError: 'NoneType' object has no attribute 'embed'" #26631

Open
5 tasks done
mkemalm opened this issue Sep 18, 2024 · 5 comments
Labels
Ɑ: vector store Related to vector store module

Comments

@mkemalm
Copy link

mkemalm commented Sep 18, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.vectorstores import Chroma
vector_store = Chroma.from_documents(documents=chunks, embedding=self.embedding, persist_directory="./chroma_db")

Error Message and Stack Trace (if applicable)

No response

Description

I am trying to run RAG samples for a few weeks. Codes were running until pulling latest langchain-community. Codes run after downgrading to 0.2.17

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Mon Aug 19 14:09:30 UTC 2024
Python Version: 3.12.5 (main, Aug 7 2024, 00:00:00) [GCC 14.2.1 20240801 (Red Hat 14.2.1-1)]

Package Information

langchain_core: 0.2.40
langchain: 0.2.16
langchain_community: 0.2.17
langsmith: 0.1.122
langchain_chroma: 0.1.4
langchain_text_splitters: 0.2.4

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.9.5
async-timeout: Installed. No version info available.
chromadb: 0.5.3
dataclasses-json: 0.6.7
fastapi: 0.115.0
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
orjson: 3.10.7
packaging: 23.2
pydantic: 2.9.2
PyYAML: 6.0.1
requests: 2.31.0
SQLAlchemy: 2.0.32
tenacity: 8.5.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the Ɑ: vector store Related to vector store module label Sep 18, 2024
@eyurtsev
Copy link
Collaborator

I am trying to run RAG samples for a few weeks. Codes were running until pulling latest langchain-community. Codes run after downgrading to 0.2.17

Could you put the env information when the code breaks? I think the sys info you provided is for langchain_community: 0.2.17

@mkemalm
Copy link
Author

mkemalm commented Sep 19, 2024

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Mon Aug 19 14:09:30 UTC 2024
Python Version: 3.12.5 (main, Aug 7 2024, 00:00:00) [GCC 14.2.1 20240801 (Red Hat 14.2.1-1)]

Package Information

langchain_core: 0.3.1
langchain: 0.3.0
langchain_community: 0.3.0
langsmith: 0.1.122
langchain_chroma: 0.1.4
langchain_text_splitters: 0.3.0

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.9.5
async-timeout: Installed. No version info available.
chromadb: 0.5.3
dataclasses-json: 0.6.7
fastapi: 0.115.0
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
orjson: 3.10.7
packaging: 23.2
pydantic: 2.9.2
pydantic-settings: 2.5.2
PyYAML: 6.0.1
requests: 2.31.0
SQLAlchemy: 2.0.32
tenacity: 8.5.0
typing-extensions: 4.12

@luon32
Copy link

luon32 commented Sep 20, 2024

I can second the observation: Latest "langchain_community" version breaks the "from_documents" call with the above mentioned error and downgrade to "langchain_community" = 0.2.17 solves the issue

@andreabellacicca
Copy link

Any news? Downgrading to "langchain_community" = 0.2.17 breaks the other packages.

If it can help, I have investigated the issue. It is an initialization issue.

This is my code:

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_chroma import Chroma

emb_model = FastEmbedEmbeddings(model_name="Qdrant/clip-ViT-B-32-text")
vector_store = Chroma(collection_name="my_collection",
                                             embedding_function=emb_model
)

What I found is The _model protected member from FastEmbedEmbeddings was never filled with the TextEmbedding. The TextEmbedding class is initialized though.

a simple patch is:

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from fastembed import TextEmbedding

emb_model = FastEmbedEmbeddings(model_name="Qdrant/clip-ViT-B-32-text")
text_emb = TextEmbedding(model_name="Qdrant/clip-ViT-B-32-text")
emb_model._model = text_emb
vector_store = Chroma(collection_name="my_collection",
                                             embedding_function=emb_model
)

In this way, of course, you initialize TextEmbedding two times.

If it can be useful, I also found that:

  1. In langchain_community.embeddings.fastembed.FastEmbedEmbeddings the values['_model'] is loaded correctly
class FastEmbedEmbeddings(BaseModel, Embeddings):
...
    def validate_environment(cls, values: Dict) -> Dict:
        ....
        values["_model"] = fastembed.TextEmbedding(
            model_name=model_name,
            max_length=max_length,
            cache_dir=cache_dir,
            threads=threads,
        )
        return values
  1. But it is lost during pydantic validation. This is the last time I can found it (pydantic.main)
image 3. Here _model is initalized with None image

Hope it can help

@BenedictusAryo
Copy link

My current simple patch while waiting for the fix:

from langchain_community.embeddings.fastembed import FastEmbedEmbeddings

embedding_model = FastEmbedEmbeddings(model_name="your_model_name")
if embedding_model._model is None:
    embedding_model._model = embedding_model.model_extra['_model']    

so no need to initialize it twice

xtreme-sameer-vohra added a commit to xtreme-sameer-vohra/local-rag that referenced this issue Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

5 participants