Skip to content

Commit 123e29d

Browse files
authored
style: more refs fixes (#33730)
1 parent 6a1dca1 commit 123e29d

File tree

49 files changed

+587
-395
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+587
-395
lines changed

libs/core/langchain_core/caches.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
33
Distinct from provider-based [prompt caching](https://docs.langchain.com/oss/python/langchain/models#prompt-caching).
44
5-
!!! warning
6-
This is a beta feature! Please be wary of deploying experimental code to production
5+
!!! warning "Beta feature"
6+
This is a beta feature. Please be wary of deploying experimental code to production
77
unless you've taken appropriate precautions.
88
99
A cache is useful for two reasons:
@@ -49,17 +49,18 @@ def lookup(self, prompt: str, llm_string: str) -> RETURN_VAL_TYPE | None:
4949
"""Look up based on `prompt` and `llm_string`.
5050
5151
A cache implementation is expected to generate a key from the 2-tuple
52-
of prompt and llm_string (e.g., by concatenating them with a delimiter).
52+
of `prompt` and `llm_string` (e.g., by concatenating them with a delimiter).
5353
5454
Args:
5555
prompt: A string representation of the prompt.
5656
In the case of a chat model, the prompt is a non-trivial
5757
serialization of the prompt into the language model.
5858
llm_string: A string representation of the LLM configuration.
59+
5960
This is used to capture the invocation parameters of the LLM
6061
(e.g., model name, temperature, stop tokens, max tokens, etc.).
61-
These invocation parameters are serialized into a string
62-
representation.
62+
63+
These invocation parameters are serialized into a string representation.
6364
6465
Returns:
6566
On a cache miss, return `None`. On a cache hit, return the cached value.
@@ -78,8 +79,10 @@ def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> N
7879
In the case of a chat model, the prompt is a non-trivial
7980
serialization of the prompt into the language model.
8081
llm_string: A string representation of the LLM configuration.
82+
8183
This is used to capture the invocation parameters of the LLM
8284
(e.g., model name, temperature, stop tokens, max tokens, etc.).
85+
8386
These invocation parameters are serialized into a string
8487
representation.
8588
return_val: The value to be cached. The value is a list of `Generation`
@@ -94,15 +97,17 @@ async def alookup(self, prompt: str, llm_string: str) -> RETURN_VAL_TYPE | None:
9497
"""Async look up based on `prompt` and `llm_string`.
9598
9699
A cache implementation is expected to generate a key from the 2-tuple
97-
of prompt and llm_string (e.g., by concatenating them with a delimiter).
100+
of `prompt` and `llm_string` (e.g., by concatenating them with a delimiter).
98101
99102
Args:
100103
prompt: A string representation of the prompt.
101104
In the case of a chat model, the prompt is a non-trivial
102105
serialization of the prompt into the language model.
103106
llm_string: A string representation of the LLM configuration.
107+
104108
This is used to capture the invocation parameters of the LLM
105109
(e.g., model name, temperature, stop tokens, max tokens, etc.).
110+
106111
These invocation parameters are serialized into a string
107112
representation.
108113
@@ -125,8 +130,10 @@ async def aupdate(
125130
In the case of a chat model, the prompt is a non-trivial
126131
serialization of the prompt into the language model.
127132
llm_string: A string representation of the LLM configuration.
133+
128134
This is used to capture the invocation parameters of the LLM
129135
(e.g., model name, temperature, stop tokens, max tokens, etc.).
136+
130137
These invocation parameters are serialized into a string
131138
representation.
132139
return_val: The value to be cached. The value is a list of `Generation`

libs/core/langchain_core/documents/__init__.py

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,28 @@
1-
"""Documents module.
1+
"""Documents module for data retrieval and processing workflows.
22
3-
**Document** module is a collection of classes that handle documents
4-
and their transformations.
3+
This module provides core abstractions for handling data in retrieval-augmented
4+
generation (RAG) pipelines, vector stores, and document processing workflows.
5+
6+
!!! warning "Documents vs. message content"
7+
This module is distinct from `langchain_core.messages.content`, which provides
8+
multimodal content blocks for **LLM chat I/O** (text, images, audio, etc. within
9+
messages).
10+
11+
**Key distinction:**
12+
13+
- **Documents** (this module): For **data retrieval and processing workflows**
14+
- Vector stores, retrievers, RAG pipelines
15+
- Text chunking, embedding, and semantic search
16+
- Example: Chunks of a PDF stored in a vector database
17+
18+
- **Content Blocks** (`messages.content`): For **LLM conversational I/O**
19+
- Multimodal message content sent to/from models
20+
- Tool calls, reasoning, citations within chat
21+
- Example: An image sent to a vision model in a chat message (via
22+
[`ImageContentBlock`][langchain.messages.ImageContentBlock])
23+
24+
While both can represent similar data types (text, files), they serve different
25+
architectural purposes in LangChain applications.
526
"""
627

728
from typing import TYPE_CHECKING

libs/core/langchain_core/documents/base.py

Lines changed: 59 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
1-
"""Base classes for media and documents."""
1+
"""Base classes for media and documents.
2+
3+
This module contains core abstractions for **data retrieval and processing workflows**:
4+
5+
- `BaseMedia`: Base class providing `id` and `metadata` fields
6+
- `Blob`: Raw data loading (files, binary data) - used by document loaders
7+
- `Document`: Text content for retrieval (RAG, vector stores, semantic search)
8+
9+
!!! note "Not for LLM chat messages"
10+
These classes are for data processing pipelines, not LLM I/O. For multimodal
11+
content in chat messages (images, audio in conversations), see
12+
`langchain.messages` content blocks instead.
13+
"""
214

315
from __future__ import annotations
416

@@ -19,15 +31,13 @@
1931

2032

2133
class BaseMedia(Serializable):
22-
"""Use to represent media content.
23-
24-
Media objects can be used to represent raw data, such as text or binary data.
34+
"""Base class for content used in retrieval and data processing workflows.
2535
26-
LangChain Media objects allow associating metadata and an optional identifier
27-
with the content.
36+
Provides common fields for content that needs to be stored, indexed, or searched.
2837
29-
The presence of an ID and metadata make it easier to store, index, and search
30-
over the content in a structured way.
38+
!!! note
39+
For multimodal content in **chat messages** (images, audio sent to/from LLMs),
40+
use `langchain.messages` content blocks instead.
3141
"""
3242

3343
# The ID field is optional at the moment.
@@ -45,61 +55,60 @@ class BaseMedia(Serializable):
4555

4656

4757
class Blob(BaseMedia):
48-
"""Blob represents raw data by either reference or value.
58+
"""Raw data abstraction for document loading and file processing.
4959
50-
Provides an interface to materialize the blob in different representations, and
51-
help to decouple the development of data loaders from the downstream parsing of
52-
the raw data.
60+
Represents raw bytes or text, either in-memory or by file reference. Used
61+
primarily by document loaders to decouple data loading from parsing.
5362
5463
Inspired by [Mozilla's `Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob)
5564
56-
Example: Initialize a blob from in-memory data
65+
???+ example "Initialize a blob from in-memory data"
5766
58-
```python
59-
from langchain_core.documents import Blob
67+
```python
68+
from langchain_core.documents import Blob
6069
61-
blob = Blob.from_data("Hello, world!")
70+
blob = Blob.from_data("Hello, world!")
6271
63-
# Read the blob as a string
64-
print(blob.as_string())
72+
# Read the blob as a string
73+
print(blob.as_string())
6574
66-
# Read the blob as bytes
67-
print(blob.as_bytes())
75+
# Read the blob as bytes
76+
print(blob.as_bytes())
6877
69-
# Read the blob as a byte stream
70-
with blob.as_bytes_io() as f:
71-
print(f.read())
72-
```
78+
# Read the blob as a byte stream
79+
with blob.as_bytes_io() as f:
80+
print(f.read())
81+
```
7382
74-
Example: Load from memory and specify mime-type and metadata
83+
??? example "Load from memory and specify MIME type and metadata"
7584
76-
```python
77-
from langchain_core.documents import Blob
85+
```python
86+
from langchain_core.documents import Blob
7887
79-
blob = Blob.from_data(
80-
data="Hello, world!",
81-
mime_type="text/plain",
82-
metadata={"source": "https://example.com"},
83-
)
84-
```
88+
blob = Blob.from_data(
89+
data="Hello, world!",
90+
mime_type="text/plain",
91+
metadata={"source": "https://example.com"},
92+
)
93+
```
8594
86-
Example: Load the blob from a file
95+
??? example "Load the blob from a file"
8796
88-
```python
89-
from langchain_core.documents import Blob
97+
```python
98+
from langchain_core.documents import Blob
9099
91-
blob = Blob.from_path("path/to/file.txt")
100+
blob = Blob.from_path("path/to/file.txt")
92101
93-
# Read the blob as a string
94-
print(blob.as_string())
102+
# Read the blob as a string
103+
print(blob.as_string())
95104
96-
# Read the blob as bytes
97-
print(blob.as_bytes())
105+
# Read the blob as bytes
106+
print(blob.as_bytes())
98107
99-
# Read the blob as a byte stream
100-
with blob.as_bytes_io() as f:
101-
print(f.read())
102-
```
108+
# Read the blob as a byte stream
109+
with blob.as_bytes_io() as f:
110+
print(f.read())
111+
```
103112
"""
104113

105114
data: bytes | str | None = None
@@ -213,7 +222,7 @@ def from_path(
213222
encoding: Encoding to use if decoding the bytes into a string
214223
mime_type: If provided, will be set as the MIME type of the data
215224
guess_type: If `True`, the MIME type will be guessed from the file
216-
extension, if a mime-type was not provided
225+
extension, if a MIME type was not provided
217226
metadata: Metadata to associate with the `Blob`
218227
219228
Returns:
@@ -274,6 +283,10 @@ def __repr__(self) -> str:
274283
class Document(BaseMedia):
275284
"""Class for storing a piece of text and associated metadata.
276285
286+
!!! note
287+
`Document` is for **retrieval workflows**, not chat I/O. For sending text
288+
to an LLM in a conversation, use message types from `langchain.messages`.
289+
277290
Example:
278291
```python
279292
from langchain_core.documents import Document

libs/core/langchain_core/documents/compressor.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,14 @@ class BaseDocumentCompressor(BaseModel, ABC):
2121
2222
This abstraction is primarily used for post-processing of retrieved documents.
2323
24-
Documents matching a given query are first retrieved.
24+
`Document` objects matching a given query are first retrieved.
2525
2626
Then the list of documents can be further processed.
2727
2828
For example, one could re-rank the retrieved documents using an LLM.
2929
3030
!!! note
31-
Users should favor using a RunnableLambda instead of sub-classing from this
31+
Users should favor using a `RunnableLambda` instead of sub-classing from this
3232
interface.
3333
3434
"""
@@ -43,9 +43,9 @@ def compress_documents(
4343
"""Compress retrieved documents given the query context.
4444
4545
Args:
46-
documents: The retrieved documents.
46+
documents: The retrieved `Document` objects.
4747
query: The query context.
48-
callbacks: Optional callbacks to run during compression.
48+
callbacks: Optional `Callbacks` to run during compression.
4949
5050
Returns:
5151
The compressed documents.
@@ -61,9 +61,9 @@ async def acompress_documents(
6161
"""Async compress retrieved documents given the query context.
6262
6363
Args:
64-
documents: The retrieved documents.
64+
documents: The retrieved `Document` objects.
6565
query: The query context.
66-
callbacks: Optional callbacks to run during compression.
66+
callbacks: Optional `Callbacks` to run during compression.
6767
6868
Returns:
6969
The compressed documents.

libs/core/langchain_core/documents/transformers.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
class BaseDocumentTransformer(ABC):
1717
"""Abstract base class for document transformation.
1818
19-
A document transformation takes a sequence of Documents and returns a
20-
sequence of transformed Documents.
19+
A document transformation takes a sequence of `Document` objects and returns a
20+
sequence of transformed `Document` objects.
2121
2222
Example:
2323
```python

libs/core/langchain_core/embeddings/fake.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ class FakeEmbeddings(Embeddings, BaseModel):
1818
1919
This embedding model creates embeddings by sampling from a normal distribution.
2020
21-
!!! warning
21+
!!! danger "Toy model"
2222
Do not use this outside of testing, as it is not a real embedding model.
2323
2424
Instantiate:
@@ -73,7 +73,7 @@ class DeterministicFakeEmbedding(Embeddings, BaseModel):
7373
This embedding model creates embeddings by sampling from a normal distribution
7474
with a seed based on the hash of the text.
7575
76-
!!! warning
76+
!!! danger "Toy model"
7777
Do not use this outside of testing, as it is not a real embedding model.
7878
7979
Instantiate:

libs/core/langchain_core/language_models/__init__.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@
66
**Chat models**
77
88
Language models that use a sequence of messages as inputs and return chat messages
9-
as outputs (as opposed to using plain text). Chat models support the assignment of
10-
distinct roles to conversation messages, helping to distinguish messages from the AI,
11-
users, and instructions such as system messages.
9+
as outputs (as opposed to using plain text).
1210
13-
The key abstraction for chat models is `BaseChatModel`. Implementations
14-
should inherit from this class.
11+
Chat models support the assignment of distinct roles to conversation messages, helping
12+
to distinguish messages from the AI, users, and instructions such as system messages.
13+
14+
The key abstraction for chat models is `BaseChatModel`. Implementations should inherit
15+
from this class.
1516
1617
See existing [chat model integrations](https://docs.langchain.com/oss/python/integrations/chat).
1718

libs/core/langchain_core/language_models/fake_chat_models.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
"""Fake chat model for testing purposes."""
1+
"""Fake chat models for testing purposes."""
22

33
import asyncio
44
import re

libs/core/langchain_core/language_models/llms.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
"""Base interface for large language models to expose."""
1+
"""Base interface for traditional large language models (LLMs) to expose.
2+
3+
These are traditionally older models (newer models generally are chat models).
4+
"""
25

36
from __future__ import annotations
47

libs/core/langchain_core/load/serializable.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,11 +97,14 @@ class Serializable(BaseModel, ABC):
9797
by default. This is to prevent accidental serialization of objects that should
9898
not be serialized.
9999
- `get_lc_namespace`: Get the namespace of the LangChain object.
100+
100101
During deserialization, this namespace is used to identify
101102
the correct class to instantiate.
103+
102104
Please see the `Reviver` class in `langchain_core.load.load` for more details.
103105
During deserialization an additional mapping is handle classes that have moved
104106
or been renamed across package versions.
107+
105108
- `lc_secrets`: A map of constructor argument names to secret ids.
106109
- `lc_attributes`: List of additional attribute names that should be included
107110
as part of the serialized representation.

0 commit comments

Comments
 (0)