Skip to content

Testset Generation Failure: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead. #1319

Closed
@Zolastic

Description

@Zolastic

Description:

I am encountering multiple warnings and errors while attempting to generate a test set using the ragas library with LangChain, Pydantic v2, and OpenAI. The main issues stem from deprecated imports and namespace conflicts. Despite following potential fixes and utilizing nest_asyncio, the issue persists.

Dependencies:

Here are the relevant dependencies from my pyproject.toml:

[tool.poetry.dependencies]
python = ">=3.12,<3.13"
ragas = "^0.1.18"
langchain = "^0.3.0"
langchain-community = "^0.3.0"
langchain-openai = "^0.2.0"
bs4 = "^0.0.2"
beautifulsoup4 = "^4.12.3"
unstructured = "^0.15.12"
libmagic = "^1.0"
python-magic-bin = "^0.4.14"
nest-asyncio = "^1.6.0"

Steps to Reproduce:

  1. Set up the environment using Python 3.12 and the dependencies listed above.
  2. Use the following code to load documents from a webpage and generate a test set:
import os
from langchain_community.document_loaders import WebBaseLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

import nest_asyncio
nest_asyncio.apply()

os.environ["OPENAI_API_KEY"] = "openai-api-key"

loader = WebBaseLoader("https://www.nba.com/stats/players")
documents = loader.load()

generator_llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18")
critic_llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

try:
    test_set = generator.generate_with_langchain_docs(documents, test_size=3, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, raise_exceptions=False)
    data_frame = test_set.to_pandas()
    data_frame.to_csv('testset.csv', index=False)
except Exception as e:
    print(f"Error during test set generation: {e}")

Error Output:

USER_AGENT environment variable not set, consider setting it to identify your requests.
/.../pydantic/_fields.py:132: UserWarning: Field "model_name" in _VertexAIBase has conflict with protected namespace "model_".
/.../pydantic/_fields.py:132: UserWarning: Field "model_name" in _VertexAICommon has conflict with protected namespace "model_".
LangChainDeprecationWarning: As of langchain-core 0.3.0, LangChain uses pydantic v2 internally...
Error during test set generation: The runner thread which was running the jobs raised an exception. Read the traceback above to debug it.

What I Have Tried:

  • Added nest_asyncio.apply() to manage async conflicts.
  • Checked compatibility between LangChain and Pydantic, replacing deprecated imports where applicable.
  • Attempted to use raise_exceptions=False as suggested and is_async=False, but the problem persists.

Environment:

  • Python Version: 3.12
  • Pydantic Version: 2.x
  • LangChain Version: 0.3.0
  • Ragas Version: 0.1.18

Expected Behavior:

The test set should be generated without warnings or namespace conflicts, and it should create the testset.csv file successfully.

Actual Behavior:

The process halts with a deprecation warning and a namespace conflict related to pydantic and model_. It raises an exception during test set generation.

Any guidance on how to resolve these namespace conflicts and LangChain compatibility issues would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule-testsetgenModule testset generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions