Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Lake mini upgrades #3375

Merged
merged 49 commits into from
Apr 24, 2023
Merged
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
97be64f
Merge pull request #1 from hwchase17/master
davidbuniat Apr 5, 2023
7ec34e1
deeplake vector store advances
Apr 5, 2023
987c377
merge
Apr 5, 2023
a2cc2ec
Merge branch 'master' of https://github.com/activeloopai/langchain
Apr 5, 2023
b9ab944
remove comments
Apr 5, 2023
a969c7a
demo update
Apr 5, 2023
f151697
Merge branch 'master' of https://github.com/hwchase17/langchain
Apr 5, 2023
313a620
typo fix
Apr 5, 2023
78c99c8
mypy fixes
Apr 5, 2023
1e1271b
filter fix on delete
Apr 5, 2023
99379be
formatting update
Apr 5, 2023
be0bafb
unused imports
Apr 5, 2023
a8816ca
ruff fix
Apr 5, 2023
4986056
fix comments
Apr 5, 2023
894d5bd
refmormat
Apr 5, 2023
c81bb90
Merge branch 'hwchase17:master' into master
davidbuniat Apr 7, 2023
236002f
deeplake vectro store improved
Apr 8, 2023
93acd8e
deeplake faster and custom filters
Apr 8, 2023
28f89ab
dretriever example added
Apr 8, 2023
5641667
Merge branch 'hwchase17:master' into master
davidbuniat Apr 8, 2023
fbf8110
typo
Apr 8, 2023
7f0b925
Merge branch 'master' of https://github.com/activeloopai/langchain
Apr 8, 2023
374491e
minor updates
Apr 8, 2023
b346833
ruf fix
Apr 8, 2023
166a2d6
added use case
Apr 8, 2023
ed21551
added code
Apr 8, 2023
e516ae8
added retriever pointer in the docs
Apr 8, 2023
598332e
Merge branch 'hwchase17:master' into master
davidbuniat Apr 8, 2023
0a34694
merge
Apr 10, 2023
40d170a
Merge branch 'hwchase17:master' into master
davidbuniat Apr 15, 2023
a4e4a4d
improve token auth and tests mode on
Apr 15, 2023
ecd6ea8
remove few flags
Apr 15, 2023
0d42983
tests update
Apr 15, 2023
c80a7d3
remove modules notebook
Apr 15, 2023
781fdc4
reemove semi-sensitive data
Apr 15, 2023
3f89c5e
Merge branch 'hwchase17:master' into master
davidbuniat Apr 21, 2023
0357e60
Merge branch 'hwchase17:master' into master
davidbuniat Apr 22, 2023
e1ee292
upgrade deeplake version and twitter notebook
Apr 23, 2023
629988d
Merge branch 'hwchase17:master' into master
davidbuniat Apr 23, 2023
061d60b
upgraded notebookss, moved to local storage instead of in-memory, set…
Apr 23, 2023
1841305
Merge branch 'master' of https://github.com/activeloopai/langchain
Apr 23, 2023
6b7c3b2
doc update
Apr 23, 2023
4eeb26d
Merge branch 'hwchase17:master' into master
davidbuniat Apr 23, 2023
07fd0c2
reformat
Apr 23, 2023
d270d59
fixed typo and added assert
Apr 23, 2023
396b6ee
reeformatting
Apr 23, 2023
619f6e5
Merge branch 'hwchase17:master' into master
davidbuniat Apr 23, 2023
8c7ecc3
added disallowed_special=() to bypass utf-8 encoding issue in example
Apr 23, 2023
4294a60
creds fix for exists
Apr 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix comments
  • Loading branch information
Davit Buniatyan committed Apr 5, 2023
commit 4986056446245e53eef5e876bb5338fba7decdcc
70 changes: 46 additions & 24 deletions langchain/vectorstores/deeplake.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ def vector_search(
query_embedding: np.ndarray
data_vectors: np.ndarray
k (int): number of nearest neighbors
distance_metric: distance function 'L2' for Euclidean, 'L1' for Nuclear, 'Max' l-infinity distnace, 'cos' for cosine similarity, 'dot' for dot product
distance_metric: distance function 'L2' for Euclidean, 'L1' for Nuclear, 'Max'
l-infinity distnace, 'cos' for cosine similarity, 'dot' for dot product

returns:
nearest_indices: List, indices of nearest neighbors
Expand All @@ -62,13 +63,16 @@ class DeepLake(VectorStore):
"""Wrapper around Deep Lake, a data lake for deep learning applications.

We implement naive similarity search and filtering for fast prototyping,
but it can be extended with Tensor Query Language (TQL) for production use cases over billion rows.
but it can be extended with Tensor Query Language (TQL) for production use cases
over billion rows.

Why Deep Lake?

- Not only stores embeddings, but also the original data with automatic version control.
- Serverless, doesn't require another service and can be used with major cloud providers (S3, GCS, etc.)
- More than just a multi-modal vector store. You can use the dataset to fine-tune your own LLM models.
- Not only stores embeddings, but also the original data with version control.
- Serverless, doesn't require another service and can be used with major
cloud providers (S3, GCS, etc.)
- More than just a multi-modal vector store. You can use the dataset
to fine-tune your own LLM models.

To use, you should have the ``deeplake`` python package installed.

Expand All @@ -92,8 +96,7 @@ def __init__(
read_only: Optional[bool] = None,
) -> None:
"""Initialize with Deep Lake client."""
import deeplake


try:
import deeplake
except ImportError:
Expand Down Expand Up @@ -210,10 +213,15 @@ def search(
query: Text to look up documents similar to.
embedding: Embedding function to use. Defaults to None.
k: Number of Documents to return. Defaults to 4.
distance_metric: `L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2`.
filter: Attribute filter by metadata example {'key': 'value'}. Defaults to None.
maximal_marginal_relevance: Whether to use maximal marginal relevance. Defaults to False.
fetch_k: Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.
distance_metric: `L2` for Euclidean, `L1` for Nuclear,
`max` L-infinity distance, `cos` for cosine similarity,
'dot' for dot product. Defaults to `L2`.
filter: Attribute filter by metadata example {'key': 'value'}.
Defaults to None.
maximal_marginal_relevance: Whether to use maximal marginal relevance.
Defaults to False.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
Defaults to 20.
return_score: Whether to return the score. Defaults to False.

Returns:
Expand Down Expand Up @@ -282,14 +290,22 @@ def similarity_search(

Args:
query: text to embed and run the query on.
k: Number of Documents to return. Defaults to 4.
k: Number of Documents to return.
Defaults to 4.
query: Text to look up documents similar to.
embedding: Embedding function to use. Defaults to None.
k: Number of Documents to return. Defaults to 4.
distance_metric: `L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2`.
filter: Attribute filter by metadata example {'key': 'value'}. Defaults to None.
maximal_marginal_relevance: Whether to use maximal marginal relevance. Defaults to False.
fetch_k: Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.
embedding: Embedding function to use.
Defaults to None.
k: Number of Documents to return.
Defaults to 4.
distance_metric: `L2` for Euclidean, `L1` for Nuclear, `max`
L-infinity distance, `cos` for cosine similarity, 'dot' for dot product
Defaults to `L2`.
filter: Attribute filter by metadata example {'key': 'value'}.
Defaults to None.
maximal_marginal_relevance: Whether to use maximal marginal relevance.
Defaults to False.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
Defaults to 20.
return_score: Whether to return the score. Defaults to False.

Returns:
Expand Down Expand Up @@ -321,7 +337,9 @@ def similarity_search_with_score(

Args:
query (str): Query text to search for.
distance_metric: `L2` for Euclidean, `L1` for Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot' for dot product. Defaults to `L2`.
distance_metric: `L2` for Euclidean, `L1` for Nuclear, `max` L-infinity
distance, `cos` for cosine similarity, 'dot' for dot product.
Defaults to `L2`.
k (int): Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None.
Returns:
Expand Down Expand Up @@ -400,8 +418,9 @@ def from_texts(
(use 'activeloop login' from command line)
- AWS S3 path of the form ``s3://bucketname/path/to/dataset``.
Credentials are required in either the environment
- Google Cloud Storage path of the form ``gcs://bucketname/path/to/dataset``
Credentials are required in either the environment
- Google Cloud Storage path of the form
``gcs://bucketname/path/to/dataset``Credentials are required
in either the environment
- Local file system path of the form ``./path/to/dataset`` or
``~/path/to/dataset`` or ``path/to/dataset``.
- In-memory path of the form ``mem://path/to/dataset`` which doesn't
Expand Down Expand Up @@ -431,9 +450,12 @@ def delete(
"""Delete the entities in the dataset

Args:
ids (Optional[List[str]], optional): The document_ids to delete. Defaults to None.
filter (Optional[Dict[str, str]], optional): The filter to delete by. Defaults to None.
delete_all (Optional[bool], optional): Whether to drop the dataset. Defaults to None.
ids (Optional[List[str]], optional): The document_ids to delete.
Defaults to None.
filter (Optional[Dict[str, str]], optional): The filter to delete by.
Defaults to None.
delete_all (Optional[bool], optional): Whether to drop the dataset.
Defaults to None.
"""
if delete_all:
self.ds.delete()
Expand Down