Skip to content

web_base loader is broken  #11095

@DmitryKatson

Description

@DmitryKatson

System Info

Name: langchain
Version: 0.0.299

Who can help?

@eyurtsev @hwchase17

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

Just follow https://python.langchain.com/docs/integrations/document_loaders/web_base

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.espn.com/")
data = loader.load()

Expected behavior

The standard WebLoader is broken when we pass webpath as described in the docs as
loader = WebBaseLoader("https://www.espn.com/")

However it workes if we pass as
loader = WebBaseLoader(["https://www.espn.com/"])

The reason for that is this commit

Also because of this breaking change the other Custom Web Loaders are broken as well IMSDbLoader, AZLyricsLoader, and CollegeConfidentialLoader

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions