Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep Lake mini upgrades #3375

Merged
merged 49 commits into from
Apr 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
97be64f
Merge pull request #1 from hwchase17/master
davidbuniat Apr 5, 2023
7ec34e1
deeplake vector store advances
Apr 5, 2023
987c377
merge
Apr 5, 2023
a2cc2ec
Merge branch 'master' of https://github.com/activeloopai/langchain
Apr 5, 2023
b9ab944
remove comments
Apr 5, 2023
a969c7a
demo update
Apr 5, 2023
f151697
Merge branch 'master' of https://github.com/hwchase17/langchain
Apr 5, 2023
313a620
typo fix
Apr 5, 2023
78c99c8
mypy fixes
Apr 5, 2023
1e1271b
filter fix on delete
Apr 5, 2023
99379be
formatting update
Apr 5, 2023
be0bafb
unused imports
Apr 5, 2023
a8816ca
ruff fix
Apr 5, 2023
4986056
fix comments
Apr 5, 2023
894d5bd
refmormat
Apr 5, 2023
c81bb90
Merge branch 'hwchase17:master' into master
davidbuniat Apr 7, 2023
236002f
deeplake vectro store improved
Apr 8, 2023
93acd8e
deeplake faster and custom filters
Apr 8, 2023
28f89ab
dretriever example added
Apr 8, 2023
5641667
Merge branch 'hwchase17:master' into master
davidbuniat Apr 8, 2023
fbf8110
typo
Apr 8, 2023
7f0b925
Merge branch 'master' of https://github.com/activeloopai/langchain
Apr 8, 2023
374491e
minor updates
Apr 8, 2023
b346833
ruf fix
Apr 8, 2023
166a2d6
added use case
Apr 8, 2023
ed21551
added code
Apr 8, 2023
e516ae8
added retriever pointer in the docs
Apr 8, 2023
598332e
Merge branch 'hwchase17:master' into master
davidbuniat Apr 8, 2023
0a34694
merge
Apr 10, 2023
40d170a
Merge branch 'hwchase17:master' into master
davidbuniat Apr 15, 2023
a4e4a4d
improve token auth and tests mode on
Apr 15, 2023
ecd6ea8
remove few flags
Apr 15, 2023
0d42983
tests update
Apr 15, 2023
c80a7d3
remove modules notebook
Apr 15, 2023
781fdc4
reemove semi-sensitive data
Apr 15, 2023
3f89c5e
Merge branch 'hwchase17:master' into master
davidbuniat Apr 21, 2023
0357e60
Merge branch 'hwchase17:master' into master
davidbuniat Apr 22, 2023
e1ee292
upgrade deeplake version and twitter notebook
Apr 23, 2023
629988d
Merge branch 'hwchase17:master' into master
davidbuniat Apr 23, 2023
061d60b
upgraded notebookss, moved to local storage instead of in-memory, set…
Apr 23, 2023
1841305
Merge branch 'master' of https://github.com/activeloopai/langchain
Apr 23, 2023
6b7c3b2
doc update
Apr 23, 2023
4eeb26d
Merge branch 'hwchase17:master' into master
davidbuniat Apr 23, 2023
07fd0c2
reformat
Apr 23, 2023
d270d59
fixed typo and added assert
Apr 23, 2023
396b6ee
reeformatting
Apr 23, 2023
619f6e5
Merge branch 'hwchase17:master' into master
davidbuniat Apr 23, 2023
8c7ecc3
added disallowed_special=() to bypass utf-8 encoding issue in example
Apr 23, 2023
4294a60
creds fix for exists
Apr 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
added code
  • Loading branch information
Davit Buniatyan committed Apr 8, 2023
commit ed21551c958b8c25d7a9e434a7e4b19d30450e0a
25 changes: 25 additions & 0 deletions docs/use_cases/code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Code Understanding

Overview

LangChain is a useful tool designed to parse GitHub code repositories. By leveraging VectorStores, Conversational RetrieverChain, and GPT-4, it can answer questions in the context of an entire GitHub repository or generate new code. This documentation page outlines the essential components of the system and guides using LangChain for better code comprehension, contextual question answering, and code generation in GitHub repositories.

## Conversational Retriever Chain

Conversational RetrieverChain is a retrieval-focused system that interacts with the data stored in a VectorStore. Utilizing advanced techniques, like context-aware filtering and ranking, it retrieves the most relevant code snippets and information for a given user query. Conversational RetrieverChain is engineered to deliver high-quality, pertinent results while considering conversation history and context.

LangChain Workflow for Code Understanding and Generation

1. Index the code base: Clone the target repository, load all files within, chunk the files, and execute the indexing process. Optionally, you can skip this step and use an already indexed dataset.

2. Embedding and Code Store: Code snippets are embedded using a code-aware embedding model and stored in a VectorStore.
Query Understanding: GPT-4 processes user queries, grasping the context and extracting relevant details.

3. Construct the Retriever: Conversational RetrieverChain searches the VectorStore to identify the most relevant code snippets for a given query.

4. Build the Conversational Chain: Customize the retriever settings and define any user-defined filters as needed.

5. Ask questions: Define a list of questions to ask about the codebase, and then use the ConversationalRetrievalChain to generate context-aware answers. The LLM (GPT-4) generates comprehensive, context-aware answers based on retrieved code snippets and conversation history.

The full tutorial is available below.
- [Twitter the-algorithm codebase analysis with Deep Lake](../modules/indexes/retrievers/examples/twitter-the-algorithm-analysis-deeplake.ipynb): A notebook walking through how to parse github source code and run queries conversation.
2 changes: 1 addition & 1 deletion langchain/vectorstores/deeplake.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ def search(
distance_metric: `L2` for Euclidean, `L1` for Nuclear,
`max` L-infinity distance, `cos` for cosine similarity,
'dot' for dot product. Defaults to `L2`.
filter: Attribute filter by metadata example {'key': 'value'}. It can also
filter: Attribute filter by metadata example {'key': 'value'}. It can also
take [Deep Lake filter]
(https://docs.deeplake.ai/en/latest/deeplake.core.dataset.html#deeplake.core.dataset.Dataset.filter)
Defaults to None.
Expand Down