Description
Description:
I am encountering multiple warnings and errors while attempting to generate a test set using the ragas
library with LangChain, Pydantic v2, and OpenAI. The main issues stem from deprecated imports and namespace conflicts. Despite following potential fixes and utilizing nest_asyncio
, the issue persists.
Dependencies:
Here are the relevant dependencies from my pyproject.toml
:
[tool.poetry.dependencies]
python = ">=3.12,<3.13"
ragas = "^0.1.18"
langchain = "^0.3.0"
langchain-community = "^0.3.0"
langchain-openai = "^0.2.0"
bs4 = "^0.0.2"
beautifulsoup4 = "^4.12.3"
unstructured = "^0.15.12"
libmagic = "^1.0"
python-magic-bin = "^0.4.14"
nest-asyncio = "^1.6.0"
Steps to Reproduce:
- Set up the environment using Python 3.12 and the dependencies listed above.
- Use the following code to load documents from a webpage and generate a test set:
import os
from langchain_community.document_loaders import WebBaseLoader
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
import nest_asyncio
nest_asyncio.apply()
os.environ["OPENAI_API_KEY"] = "openai-api-key"
loader = WebBaseLoader("https://www.nba.com/stats/players")
documents = loader.load()
generator_llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18")
critic_llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
try:
test_set = generator.generate_with_langchain_docs(documents, test_size=3, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, raise_exceptions=False)
data_frame = test_set.to_pandas()
data_frame.to_csv('testset.csv', index=False)
except Exception as e:
print(f"Error during test set generation: {e}")
Error Output:
USER_AGENT environment variable not set, consider setting it to identify your requests.
/.../pydantic/_fields.py:132: UserWarning: Field "model_name" in _VertexAIBase has conflict with protected namespace "model_".
/.../pydantic/_fields.py:132: UserWarning: Field "model_name" in _VertexAICommon has conflict with protected namespace "model_".
LangChainDeprecationWarning: As of langchain-core 0.3.0, LangChain uses pydantic v2 internally...
Error during test set generation: The runner thread which was running the jobs raised an exception. Read the traceback above to debug it.
What I Have Tried:
- Added
nest_asyncio.apply()
to manage async conflicts. - Checked compatibility between LangChain and Pydantic, replacing deprecated imports where applicable.
- Attempted to use
raise_exceptions=False
as suggested andis_async=False
, but the problem persists.
Environment:
- Python Version: 3.12
- Pydantic Version: 2.x
- LangChain Version: 0.3.0
- Ragas Version: 0.1.18
Expected Behavior:
The test set should be generated without warnings or namespace conflicts, and it should create the testset.csv
file successfully.
Actual Behavior:
The process halts with a deprecation warning and a namespace conflict related to pydantic
and model_
. It raises an exception during test set generation.
Any guidance on how to resolve these namespace conflicts and LangChain compatibility issues would be greatly appreciated.