Skip to content

Dev #1023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 414 commits into from
Jan 23, 2025
Merged

Dev #1023

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
414 commits
Select commit Hold shift + click to select a range
3c2e344
fixed the rerendering of the table while file status is processing
kartikpersistent Sep 27, 2024
292ae9e
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Sep 27, 2024
0d93f57
fix: Read Only User Fix
kartikpersistent Sep 27, 2024
dc994a2
Global search fulltext (#767)
prakriti-solankey Sep 27, 2024
1180406
Added elapsed time for extarction on each breakdown function
praveshkumar1988 Sep 27, 2024
9206fc8
lint and format fixes
prakriti-solankey Sep 27, 2024
34537f7
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Sep 27, 2024
0405816
removed dev logs
kartikpersistent Sep 27, 2024
c9844b7
communities fix
prakriti-solankey Sep 27, 2024
e4c3349
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Sep 27, 2024
085a0e4
disabled the generate graph for read only user
kartikpersistent Sep 27, 2024
7813750
format fixes
kartikpersistent Sep 27, 2024
7cb7957
graph labels change
prakriti-solankey Sep 27, 2024
b7559ac
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Sep 27, 2024
e0e97fb
added the readonly check for already added waiting files
kartikpersistent Sep 27, 2024
42f4c82
Retriever evaluation using RAGAS
kaustubh-darekar Sep 27, 2024
c8e8387
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kaustubh-darekar Sep 27, 2024
e189ccd
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Sep 27, 2024
b663ff4
deleted unused file
kartikpersistent Sep 30, 2024
7c65344
code optimization using memo
kartikpersistent Sep 30, 2024
c69b708
Added elapsed_time on each api and getiing time per_entity
praveshkumar1988 Sep 30, 2024
393c53c
Added the post processing Alert showcasing the ongoing post processin…
kartikpersistent Oct 1, 2024
0dec005
fix: readonly user retry option disable
kartikpersistent Oct 1, 2024
ac3d88a
update script to get details of extarcted doc
abhishekkumar-27 Oct 3, 2024
83351ac
Issue fixed, Latency count per entity
praveshkumar1988 Oct 3, 2024
178dacb
Multiple chat modes selection (#780)
kartikpersistent Oct 4, 2024
1a33f0d
Fix: ChatModes DeSelection on FIle Selection
kartikpersistent Oct 7, 2024
e576055
Fix: Order of the chatmodes accordoing to selected chatmodes
kartikpersistent Oct 7, 2024
d1a56ca
Community optimization (#790)
vasanthasaikalluri Oct 9, 2024
7f255ee
Async way to create entities from multiple chunks (#788)
aashipandya Oct 9, 2024
943f539
fixed graph mode error (#792)
vasanthasaikalluri Oct 10, 2024
126dd48
Raga's Evaluation Metrics (#787)
kartikpersistent Oct 10, 2024
b8296e9
Openai gemini config (#794)
aashipandya Oct 10, 2024
b04f382
Added the user action for metrics table
kartikpersistent Oct 10, 2024
6aa84c7
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Oct 10, 2024
f97806c
Graph enhancements (#795)
prakriti-solankey Oct 10, 2024
ba95afd
format changes
prakriti-solankey Oct 10, 2024
0cc118d
Communities Bug fixes (#775)
prakriti-solankey Oct 10, 2024
fcb9ab5
llm name changes
kartikpersistent Oct 11, 2024
f505488
build fix
kartikpersistent Oct 11, 2024
37de220
default mode fix
kartikpersistent Oct 11, 2024
a538226
ragas model names update
kaustubh-darekar Oct 11, 2024
784caa6
lint fixes
kartikpersistent Oct 11, 2024
b814f71
Chunk Entities API condition
kartikpersistent Oct 11, 2024
69793a6
added the tooltip for unsupported lllms for ragas metric loading
kartikpersistent Oct 11, 2024
c5a3dbf
removed unused imports
kartikpersistent Oct 11, 2024
acdc886
multimode fix when we get error response
kartikpersistent Oct 11, 2024
2734c4a
mode changes for score display
prakriti-solankey Oct 11, 2024
cbd3f25
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Oct 11, 2024
a12a6ab
fix: Fixed the details state handling between multiple chats
kartikpersistent Oct 15, 2024
ba091a0
Fix: Entity Mode Width Fix
kartikpersistent Oct 15, 2024
93c3dd3
diffbot fix for async (#797)
aashipandya Oct 15, 2024
821b0f4
Minor changes (#798)
vasanthasaikalluri Oct 15, 2024
702ebf7
New: Added the supported llm models for ragas evaluation
kartikpersistent Oct 15, 2024
3f1633c
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Oct 15, 2024
dcb3975
Fix: Communitites Tab is displayed based communitites length
kartikpersistent Oct 15, 2024
d3e5365
added the conversation download button (#800)
kartikpersistent Oct 15, 2024
0ff8bd1
model name correction
prakriti-solankey Oct 15, 2024
441908b
Merge branch 'STAGING' into DEV
prakriti-solankey Oct 15, 2024
09ea071
chatmode switch mode fix
kartikpersistent Oct 17, 2024
99dc052
Add API payload GCP logging (#805)
praveshkumar1988 Oct 18, 2024
a8f821a
Adding Links to get neighboring nodes (#796)
prakriti-solankey Oct 18, 2024
6c6da26
added error message for doc retriver (#807)
vasanthasaikalluri Oct 18, 2024
3d587f0
copy row (#803)
prakriti-solankey Oct 18, 2024
845bfb7
Raga's Evaluation For Multi Modes (#806)
kartikpersistent Oct 18, 2024
952291d
lint fixes
kartikpersistent Oct 18, 2024
f5a5edd
fix: multimode metrics state handling
kartikpersistent Oct 21, 2024
b3f1dd0
fix: Multimode metrics mode change state issue
kartikpersistent Oct 21, 2024
fb5e000
fix: list style fix
kartikpersistent Oct 21, 2024
fd224a1
Correct TYPO mistake
praveshkumar1988 Oct 21, 2024
cb77c18
added new env for ragas embedding model
vasanthasaikalluri Oct 21, 2024
986ae29
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
vasanthasaikalluri Oct 21, 2024
30d6ea8
Merge branch 'STAGING' into DEV
praveshkumar1988 Oct 21, 2024
5c0081e
Props name changes (#811)
kartikpersistent Oct 22, 2024
ee71002
test
prakriti-solankey Oct 22, 2024
c115014
view graph
prakriti-solankey Oct 22, 2024
c200b61
nodes count and relationshipcount updation fix
kartikpersistent Oct 22, 2024
1cc81ce
Merge branch 'STAGING' into DEV
kartikpersistent Oct 22, 2024
340679b
sourceUrl Fix
kartikpersistent Oct 22, 2024
055e40f
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Oct 22, 2024
fb35bda
empty string "" fix to keep the default values we should keep the val…
kartikpersistent Oct 23, 2024
ed18462
prop changes
kartikpersistent Oct 23, 2024
985993e
props changes
kartikpersistent Oct 23, 2024
220bee7
retry condition update for failed files (#820)
aashipandya Oct 23, 2024
0508585
Chat modes name changes (#815)
kartikpersistent Oct 23, 2024
9dba361
Youtube transcript fix with proxy (#822)
aashipandya Oct 23, 2024
6839e52
update script for async func
abhishekkumar-27 Oct 23, 2024
a5d29fa
ragas changes for graph retrieval mode. context added in api output (…
kaustubh-darekar Oct 24, 2024
cb59a2a
Remove extract latency from logging and add LIMIT in duplicate nodes
praveshkumar1988 Oct 24, 2024
93d7f3b
Document updates (#828)
kaustubh-darekar Oct 24, 2024
0d2882c
Update README.md
kartikpersistent Oct 24, 2024
6a6dc05
updated api structire in docs (#827)
vasanthasaikalluri Oct 24, 2024
29ef09b
Update backend_docs.adoc
karanchellani Oct 24, 2024
c5cd025
821 llm model listing (#823)
prakriti-solankey Oct 24, 2024
dfbb042
Merge branch 'STAGING' into DEV
kartikpersistent Oct 24, 2024
4bed352
Exclude session lable node from duplicate nodes list
praveshkumar1988 Oct 25, 2024
3dfb42b
Added the tooltip for disabled llm option (#835)
kartikpersistent Oct 25, 2024
4d795bf
node size changes
prakriti-solankey Oct 25, 2024
1fac375
mode removal of rows check
prakriti-solankey Oct 25, 2024
eb14fbe
formatting
prakriti-solankey Oct 25, 2024
0331cc7
Merge branch 'STAGING' into DEV
prakriti-solankey Oct 25, 2024
5cd9724
Exclude __Entity__ node label from duplicate node list
praveshkumar1988 Oct 25, 2024
70cb004
Update README.md
kartikpersistent Oct 28, 2024
bf51e78
Update README.md
kartikpersistent Oct 29, 2024
76b325c
Update README.md
kartikpersistent Oct 29, 2024
1d607bc
fixed the youtube link
kartikpersistent Oct 30, 2024
d8af5a5
Security header and GZIPMiddleware (#847)
praveshkumar1988 Nov 8, 2024
358d5a6
Chunk Text Details (#850)
kaustubh-darekar Nov 8, 2024
6d35a34
Communities Id to Title (#851)
prakriti-solankey Nov 8, 2024
cd6b4c2
disconnected nodes (#852)
prakriti-solankey Nov 8, 2024
282cfa0
loading changes
prakriti-solankey Nov 8, 2024
399785f
loading changes
prakriti-solankey Nov 8, 2024
686ed95
Update score.py
karanchellani Nov 11, 2024
4f1af18
added middleware
kartikpersistent Nov 12, 2024
1c29940
removed the unused state
kartikpersistent Nov 12, 2024
1d14749
Youtube timestamp (#877)
aashipandya Nov 14, 2024
d6f4ac6
Handled Nonetype error during global search. (#876)
kaustubh-darekar Nov 14, 2024
a5a989d
Additional metrics using ground truth (#855)
kartikpersistent Nov 15, 2024
f8a48d1
Url changes and state management (#870)
prakriti-solankey Nov 15, 2024
4da75b9
removed unused prop
kartikpersistent Nov 18, 2024
f143e9b
chat mode width fix
kartikpersistent Nov 18, 2024
8928ecb
Table selection Fix
kartikpersistent Nov 18, 2024
823faa0
Table issue (#885)
kartikpersistent Nov 18, 2024
ff63ca5
Logging properties update, remove payload json
praveshkumar1988 Nov 18, 2024
9a50138
fix: readme typos (#887)
marcoscannabrava Nov 19, 2024
cd6cbee
Break down nodes (#881)
kartikpersistent Nov 20, 2024
95eaf19
youtube url fix
kartikpersistent Nov 20, 2024
39e2807
Commented CSP middleware and added endpoint backend_connection_config…
praveshkumar1988 Nov 20, 2024
65c21ca
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
praveshkumar1988 Nov 20, 2024
c7b4dc0
added csp header
kartikpersistent Nov 20, 2024
f244b9f
removed the useEffect
kartikpersistent Nov 21, 2024
9d17ec1
Table issue (#888)
prakriti-solankey Nov 21, 2024
86c33ba
key fix
kartikpersistent Nov 21, 2024
09df9e6
Update README.md
kartikpersistent Nov 22, 2024
5610247
Update README.md
kartikpersistent Nov 22, 2024
a247d75
Update README.md
kartikpersistent Nov 22, 2024
69d46ab
removed extra document nodes and combine chunk logic (#894)
aashipandya Nov 22, 2024
4721318
Update README.md
prakriti-solankey Nov 22, 2024
ab0405f
Update README.md
kartikpersistent Nov 22, 2024
9628b6d
conditional deployment based on the enviornment
kartikpersistent Nov 22, 2024
009bc0c
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Nov 22, 2024
03526a4
Update README.md
kartikpersistent Nov 22, 2024
8562880
Update README.md
kartikpersistent Nov 22, 2024
c5603e6
removed the reference answer checkbox and textarea while additional m…
kartikpersistent Nov 22, 2024
7e8df27
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Nov 22, 2024
21e8855
LLM_MODELS
kartikpersistent Nov 22, 2024
33354f3
re process feature state renaming (#898)
kartikpersistent Nov 25, 2024
9e8ffca
Community Counts after post processing (#890)
kaustubh-darekar Nov 25, 2024
8af533c
format and checked fixes (#897)
prakriti-solankey Nov 25, 2024
4e20384
added info to show 50 chunks processing (#899)
prakriti-solankey Nov 26, 2024
b117df8
format and lint fixes
kartikpersistent Nov 26, 2024
f76d684
Env changes (#896)
prakriti-solankey Nov 26, 2024
63325f9
Update Content.tsx
kartikpersistent Nov 26, 2024
1992de7
Update Content.tsx
kartikpersistent Nov 26, 2024
3e34874
build fix
kartikpersistent Nov 27, 2024
22a57c5
communitifiles array check
kartikpersistent Nov 27, 2024
08a9755
combining one chunk (#901)
aashipandya Nov 28, 2024
3774a86
Delete query refined to delete all related nodes of file (#904)
kaustubh-darekar Nov 28, 2024
bf7b739
readonly change
prakriti-solankey Nov 28, 2024
f4e593e
Prod v6 fix (#909)
prakriti-solankey Dec 2, 2024
6b3c9c9
enable_communities flag removed in backend (#906)
kaustubh-darekar Dec 2, 2024
47a2db7
added multimode metrics for json and fix for gemini ground truth metrics
kartikpersistent Dec 3, 2024
40ac628
Enhancement default model selection (#911)
kartikpersistent Dec 3, 2024
9d023a4
Bug Fixing (#916)
prakriti-solankey Dec 3, 2024
b85d31e
Update documentation (#915)
kaustubh-darekar Dec 3, 2024
9784090
Updating langchain-neo4j package (#891)
kaustubh-darekar Dec 3, 2024
b09131e
gcs file existance check and reprocess from last processed position c…
aashipandya Dec 3, 2024
6e9edce
format and lint fixes
kartikpersistent Dec 4, 2024
c5d1532
Merge branch 'STAGING' into DEV
prakriti-solankey Dec 4, 2024
7dd80f9
Merge branch 'STAGING' of https://github.com/neo4j-labs/llm-graph-bui…
kartikpersistent Dec 4, 2024
a9420ae
updated requirements (#923)
aashipandya Dec 4, 2024
8ce1693
Update Constants.ts
kartikpersistent Dec 5, 2024
cc3dc3d
Metric table issues (#921)
kartikpersistent Dec 5, 2024
97990bd
test case updated
abhishekkumar-27 Dec 5, 2024
e8cc74b
updated test case
abhishekkumar-27 Dec 5, 2024
bad6389
elementid if no id is there
prakriti-solankey Dec 5, 2024
992fa9f
backenapi for all env
prakriti-solankey Dec 5, 2024
7b3b1b4
Bug fixing for Icon (#924)
prakriti-solankey Dec 5, 2024
fc85b45
diffbot placement
prakriti-solankey Dec 5, 2024
fac07e3
Update MultiModeMetrics.tsx
kartikpersistent Dec 5, 2024
0fa4d84
libmagic1 library added
aashipandya Dec 5, 2024
a86e07f
Document read me update (#926)
prakriti-solankey Dec 6, 2024
98cfc3c
metric table and default model fixes
kartikpersistent Dec 9, 2024
b671189
Updated code for duplicate nodes and index dimension mismatch (#929)
kaustubh-darekar Dec 10, 2024
c431903
Merge branch 'STAGING' into DEV
kartikpersistent Dec 10, 2024
01eb4a2
added autocomplete for better accessbility
kartikpersistent Dec 10, 2024
887e0b8
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Dec 10, 2024
1f74670
UX: improvement added deleteloader for chat
kartikpersistent Dec 11, 2024
e29ebb4
Merge branch 'STAGING' into DEV
kartikpersistent Dec 11, 2024
5d584f2
test updated
abhishekkumar-27 Dec 12, 2024
3c3a40d
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
abhishekkumar-27 Dec 12, 2024
44c183e
Integrate vector dimesion check in backend configuration API
praveshkumar1988 Dec 12, 2024
e4bf031
removed hardcoded CSS values (#934)
kartikpersistent Dec 13, 2024
4bfac7b
fixed responiveness of the table
kartikpersistent Dec 13, 2024
a67f918
Error & warning handling (#938)
kaustubh-darekar Dec 13, 2024
5a86987
Error handling for model format in backend & frontend env (#946)
kaustubh-darekar Dec 16, 2024
e954f99
Added the check to initilize DB connection when creds in env are not …
praveshkumar1988 Dec 16, 2024
2293ede
Correct TYPO mistake
praveshkumar1988 Dec 16, 2024
5b4fd5a
spell fix
kartikpersistent Dec 16, 2024
1ded64a
Handled EquivalentSchemaRuleAlreadyExist due to race condition (#949)
kaustubh-darekar Dec 17, 2024
b9a551a
added xlxs format support
kartikpersistent Dec 17, 2024
a69dca3
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Dec 17, 2024
20d1c3d
vector index name fixed (#950)
kaustubh-darekar Dec 17, 2024
c6ddc63
Update backend example.env
kaustubh-darekar Dec 18, 2024
633ed87
updated_libs (#955)
aashipandya Dec 18, 2024
41ae360
File expiration alert (#953)
kartikpersistent Dec 18, 2024
2ebfc2d
code simplification and warning fix
kartikpersistent Dec 19, 2024
dbe65a1
Put Grapd DB connection out of the loop to prevent pooling connection
praveshkumar1988 Dec 19, 2024
a05e87b
Merge branch 'STAGING' into DEV
prakriti-solankey Dec 19, 2024
60b4393
Resolved file not deleting in case of filesource missing. (#960)
kaustubh-darekar Dec 19, 2024
0f295e2
Update example.env
kaustubh-darekar Dec 23, 2024
b49fb9f
extracting entities from existing KG when certain nodes doesnt have i…
kaustubh-darekar Dec 23, 2024
f77893e
Issue fixed for web-URL when title and language not getting in metadata
praveshkumar1988 Dec 24, 2024
83d5a9d
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
praveshkumar1988 Dec 24, 2024
b4e242f
custom_change (#966)
prakriti-solankey Dec 26, 2024
e881915
Add loging for backend config API
praveshkumar1988 Dec 26, 2024
f08292c
GCS bucket file processing issue and custom exception for chunks alre…
aashipandya Dec 31, 2024
c8247f4
Removed commented code and unused library (#973)
praveshkumar1988 Dec 31, 2024
ef832d8
Changed delete query to delete documents in batches for efficient mem…
kaustubh-darekar Jan 2, 2025
fa25762
Title is blank from metadata then assingned from URL (#982)
praveshkumar1988 Jan 2, 2025
9655d1a
Added Effective search ratio (#981)
kaustubh-darekar Jan 2, 2025
edcce9e
Check chunks available to reprocess file (#984)
praveshkumar1988 Jan 2, 2025
1320884
Chunks Not created alert display
kartikpersistent Jan 3, 2025
320b975
Correct the message
praveshkumar1988 Jan 3, 2025
9e88b63
Update Content.tsx
kartikpersistent Jan 3, 2025
ec2055c
removed ready to reprocess check (#979)
prakriti-solankey Jan 7, 2025
894eb23
Removal of isSchema check for graphType post processing Job (#990)
prakriti-solankey Jan 8, 2025
117d287
Update README.md & FrontendDoc (#974)
prakriti-solankey Jan 8, 2025
5d0c8ec
Nova models trial. (#993)
kaustubh-darekar Jan 8, 2025
1732f2f
using hook for selectedNodes and Relation (#995)
prakriti-solankey Jan 8, 2025
8049c3d
Log entry error resolved (#994)
kaustubh-darekar Jan 14, 2025
1247a21
Spelling mistake fixed for condition of setting node_properties (#1004)
kaustubh-darekar Jan 14, 2025
e8a4617
Limit chunks to process (#1000)
praveshkumar1988 Jan 14, 2025
57a166d
Notebook for cleanup of graph model (#957)
aashipandya Jan 14, 2025
547c46e
check db version to execute admin command (#997)
praveshkumar1988 Jan 14, 2025
7f70fb9
custom error in extract and url_scan API as LLMGraphBuilderException.…
praveshkumar1988 Jan 14, 2025
e1db6ce
Merge branch 'STAGING' into DEV
kaustubh-darekar Jan 14, 2025
f2a800d
rectified code to not include Document node while graph_consolidation…
kaustubh-darekar Jan 15, 2025
b7229a3
Nova models addition (#1006)
kaustubh-darekar Jan 15, 2025
d29f688
New models (#1009)
prakriti-solankey Jan 15, 2025
12ce5e8
Graph consolidation prompt updated (#1013)
kaustubh-darekar Jan 20, 2025
1ba89d1
Graph consolidation changes (#1014)
prakriti-solankey Jan 21, 2025
9e5890d
Chunks to be created (#1015)
kartikpersistent Jan 21, 2025
1624903
Oauth integration (#1011)
kartikpersistent Jan 21, 2025
05d211b
Message placement changes
kartikpersistent Jan 21, 2025
123a6bd
Docker changes
kartikpersistent Jan 21, 2025
2b230e3
Feature toggle for authentication
kartikpersistent Jan 22, 2025
cf92dd6
minor issue fixed
praveshkumar1988 Jan 22, 2025
916910b
passing and logging authenticated user email in backend logging (#1019)
kartikpersistent Jan 23, 2025
d0c0a42
Merge branch 'STAGING' into DEV
praveshkumar1988 Jan 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,9 @@ Allow unauthenticated request : Yes
| VITE_GOOGLE_CLIENT_ID | Optional | | Client ID for Google authentication |
| VITE_LLM_MODELS_PROD | Optional | openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash | To Distinguish models based on the Enviornment PROD or DEV
| VITE_LLM_MODELS | Optional | 'diffbot,openai_gpt_3.5,openai_gpt_4o,openai_gpt_4o_mini,gemini_1.5_pro,gemini_1.5_flash,azure_ai_gpt_35,azure_ai_gpt_4o,ollama_llama3,groq_llama3_70b,anthropic_claude_3_5_sonnet' | Supported Models For the application
| VITE_AUTH0_CLIENT_ID | Mandatory if you are enabling Authentication otherwise it is optional | |Okta Oauth Client ID for authentication
| VITE_AUTH0_DOMAIN | Mandatory if you are enabling Authentication otherwise it is optional | | Okta Oauth Cliend Domain
| VITE_SKIP_AUTH | Optional | true | Flag to skip the authentication

## LLMs Supported
1. OpenAI
Expand Down
6 changes: 5 additions & 1 deletion backend/example.env
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,8 @@ LLM_MODEL_CONFIG_ollama_llama3="model_name,model_local_url"
YOUTUBE_TRANSCRIPT_PROXY="https://user:pass@domain:port"
EFFECTIVE_SEARCH_RATIO=5
GRAPH_CLEANUP_MODEL="openai_gpt_4o"
CHUNKS_TO_BE_PROCESSED="50"
CHUNKS_TO_BE_CREATED="50"
BEDROCK_EMBEDDING_MODEL="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.titan-embed-text-v1"
LLM_MODEL_CONFIG_bedrock_nova_micro_v1="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.nova-micro-v1:0"
LLM_MODEL_CONFIG_bedrock_nova_lite_v1="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.nova-lite-v1:0"
LLM_MODEL_CONFIG_bedrock_nova_pro_v1="model_name,aws_access_key,aws_secret_key,region_name" #model_name="amazon.nova-pro-v1:0"
132 changes: 72 additions & 60 deletions backend/score.py

Large diffs are not rendered by default.

18 changes: 12 additions & 6 deletions backend/src/create_chunks.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import logging
from src.document_sources.youtube import get_chunks_with_timestamps, get_calculated_timestamps
import re
import os

logging.basicConfig(format="%(asctime)s - %(message)s", level="INFO")

Expand All @@ -25,23 +26,28 @@ def split_file_into_chunks(self):
"""
logging.info("Split file into smaller chunks")
text_splitter = TokenTextSplitter(chunk_size=200, chunk_overlap=20)
chunk_to_be_created = int(os.environ.get('CHUNKS_TO_BE_CREATED', '50'))
if 'page' in self.pages[0].metadata:
chunks = []
for i, document in enumerate(self.pages):
page_number = i + 1
for chunk in text_splitter.split_documents([document]):
chunks.append(Document(page_content=chunk.page_content, metadata={'page_number':page_number}))
if len(chunks) >= chunk_to_be_created:
break
else:
for chunk in text_splitter.split_documents([document]):
chunks.append(Document(page_content=chunk.page_content, metadata={'page_number':page_number}))

elif 'length' in self.pages[0].metadata:
if len(self.pages) == 1 or (len(self.pages) > 1 and self.pages[1].page_content.strip() == ''):
match = re.search(r'(?:v=)([0-9A-Za-z_-]{11})\s*',self.pages[0].metadata['source'])
youtube_id=match.group(1)
chunks_without_time_range = text_splitter.split_documents([self.pages[0]])
chunks = get_calculated_timestamps(chunks_without_time_range, youtube_id)

chunks = get_calculated_timestamps(chunks_without_time_range[:chunk_to_be_created], youtube_id)
else:
chunks_without_time_range = text_splitter.split_documents(self.pages)
chunks = get_chunks_with_timestamps(chunks_without_time_range)
chunks_without_time_range = text_splitter.split_documents(self.pages)
chunks = get_chunks_with_timestamps(chunks_without_time_range[:chunk_to_be_created])
else:
chunks = text_splitter.split_documents(self.pages)

chunks = chunks[:chunk_to_be_created]
return chunks
28 changes: 27 additions & 1 deletion backend/src/graphDB_dataAccess.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,4 +535,30 @@ def update_node_relationship_count(self,document_name):
"nodeCount" : nodeCount,
"relationshipCount" : relationshipCount
}
return response
return response

def get_nodelabels_relationships(self):
node_query = """
CALL db.labels() YIELD label
WITH label
WHERE NOT label IN ['Document', 'Chunk', '_Bloom_Perspective_', '__Community__', '__Entity__']
CALL apoc.cypher.run("MATCH (n:`" + label + "`) RETURN count(n) AS count",{}) YIELD value
WHERE value.count > 0
RETURN label order by label
"""

relation_query = """
CALL db.relationshipTypes() yield relationshipType
WHERE NOT relationshipType IN ['PART_OF', 'NEXT_CHUNK', 'HAS_ENTITY', '_Bloom_Perspective_','FIRST_CHUNK','SIMILAR','IN_COMMUNITY','PARENT_COMMUNITY']
return relationshipType order by relationshipType
"""

try:
node_result = self.execute_query(node_query)
node_labels = [record["label"] for record in node_result]
relationship_result = self.execute_query(relation_query)
relationship_types = [record["relationshipType"] for record in relationship_result]
return node_labels,relationship_types
except Exception as e:
print(f"Error in getting node labels/relationship types from db: {e}")
return []
2 changes: 1 addition & 1 deletion backend/src/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def get_llm(model: str):
)

llm = ChatBedrock(
client=bedrock_client, model_id=model_name, model_kwargs=dict(temperature=0)
client=bedrock_client,region_name=region_name, model_id=model_name, model_kwargs=dict(temperature=0)
)

elif "ollama" in model:
Expand Down
3 changes: 1 addition & 2 deletions backend/src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,6 @@ async def processing_source(uri, userName, password, database, model, file_name,

logging.info('Update the status as Processing')
update_graph_chunk_processed = int(os.environ.get('UPDATE_GRAPH_CHUNKS_PROCESSED'))
chunk_to_be_processed = int(os.environ.get('CHUNKS_TO_BE_PROCESSED', '50'))
# selected_chunks = []
is_cancelled_status = False
job_status = "Completed"
Expand Down Expand Up @@ -676,7 +675,7 @@ def get_labels_and_relationtypes(graph):
query = """
RETURN collect {
CALL db.labels() yield label
WHERE NOT label IN ['Chunk','_Bloom_Perspective_', '__Community__', '__Entity__']
WHERE NOT label IN ['Document','Chunk','_Bloom_Perspective_', '__Community__', '__Entity__']
return label order by label limit 100 } as labels,
collect {
CALL db.relationshipTypes() yield relationshipType as type
Expand Down
66 changes: 26 additions & 40 deletions backend/src/post_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
from langchain_core.prompts import ChatPromptTemplate
from src.shared.constants import GRAPH_CLEANUP_PROMPT
from src.llm import get_llm
from src.main import get_labels_and_relationtypes
from src.graphDB_dataAccess import graphDBdataAccess
import time

DROP_INDEX_QUERY = "DROP INDEX entities IF EXISTS;"
LABELS_QUERY = "CALL db.labels()"
Expand Down Expand Up @@ -195,50 +196,35 @@ def update_embeddings(rows, graph):
return graph.query(query,params={'rows':rows})

def graph_schema_consolidation(graph):
nodes_and_relations = get_labels_and_relationtypes(graph)
logging.info(f"nodes_and_relations in existing graph : {nodes_and_relations}")
node_labels = []
relation_labels = []

node_labels.extend(nodes_and_relations[0]['labels'])
relation_labels.extend(nodes_and_relations[0]['relationshipTypes'])

graphDb_data_Access = graphDBdataAccess(graph)
node_labels,relation_labels = graphDb_data_Access.get_nodelabels_relationships()
parser = JsonOutputParser()
prompt = ChatPromptTemplate(messages=[("system",GRAPH_CLEANUP_PROMPT),("human", "{input}")],
partial_variables={"format_instructions": parser.get_format_instructions()})

graph_cleanup_model = os.getenv("GRAPH_CLEANUP_MODEL",'openai_gpt_4o')
prompt = ChatPromptTemplate(
messages=[("system", GRAPH_CLEANUP_PROMPT), ("human", "{input}")],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
graph_cleanup_model = os.getenv("GRAPH_CLEANUP_MODEL", 'openai_gpt_4o')
llm, _ = get_llm(graph_cleanup_model)
chain = prompt | llm | parser
nodes_dict = chain.invoke({'input':node_labels})
relation_dict = chain.invoke({'input':relation_labels})

node_match = {}
relation_match = {}
for new_label , values in nodes_dict.items() :
for old_label in values:
if new_label != old_label:
node_match[old_label]=new_label

for new_label , values in relation_dict.items() :
for old_label in values:
if new_label != old_label:
relation_match[old_label]=new_label

logging.info(f"updated node labels : {node_match}")
logging.info(f"updated relationship labels : {relation_match}")

# Update node labels in graph
for old_label, new_label in node_match.items():
query = f"""
MATCH (n:`{old_label}`)
SET n:`{new_label}`
REMOVE n:`{old_label}`
"""
graph.query(query)
nodes_relations_input = {'nodes': node_labels, 'relationships': relation_labels}
mappings = chain.invoke({'input': nodes_relations_input})
node_mapping = {old: new for new, old_list in mappings['nodes'].items() for old in old_list if new != old}
relation_mapping = {old: new for new, old_list in mappings['relationships'].items() for old in old_list if new != old}

logging.info(f"Node Labels: Total = {len(node_labels)}, Reduced to = {len(set(node_mapping.values()))} (from {len(node_mapping)})")
logging.info(f"Relationship Types: Total = {len(relation_labels)}, Reduced to = {len(set(relation_mapping.values()))} (from {len(relation_mapping)})")

if node_mapping:
for old_label, new_label in node_mapping.items():
query = f"""
MATCH (n:`{old_label}`)
SET n:`{new_label}`
REMOVE n:`{old_label}`
"""
graph.query(query)

# Update relation types in graph
for old_label, new_label in relation_match.items():
for old_label, new_label in relation_mapping.items():
query = f"""
MATCH (n)-[r:`{old_label}`]->(m)
CREATE (n)-[r2:`{new_label}`]->(m)
Expand Down
43 changes: 41 additions & 2 deletions backend/src/shared/common_fn.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
import os
from pathlib import Path
from urllib.parse import urlparse

import boto3
from langchain_community.embeddings import BedrockEmbeddings

def check_url_source(source_type, yt_url:str=None, wiki_query:str=None):
language=''
Expand Down Expand Up @@ -77,6 +78,10 @@ def load_embedding_model(embedding_model_name: str):
)
dimension = 768
logging.info(f"Embedding: Using Vertex AI Embeddings , Dimension:{dimension}")
elif embedding_model_name == "titan":
embeddings = get_bedrock_embeddings()
dimension = 1536
logging.info(f"Embedding: Using bedrock titan Embeddings , Dimension:{dimension}")
else:
embeddings = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2"#, cache_folder="/embedding_model"
Expand Down Expand Up @@ -134,4 +139,38 @@ def last_url_segment(url):
parsed_url = urlparse(url)
path = parsed_url.path.strip("/") # Remove leading and trailing slashes
last_url_segment = path.split("/")[-1] if path else parsed_url.netloc.split(".")[0]
return last_url_segment
return last_url_segment

def get_bedrock_embeddings():
"""
Creates and returns a BedrockEmbeddings object using the specified model name.
Args:
model (str): The name of the model to use for embeddings.
Returns:
BedrockEmbeddings: An instance of the BedrockEmbeddings class.
"""
try:
env_value = os.getenv("BEDROCK_EMBEDDING_MODEL")
if not env_value:
raise ValueError("Environment variable 'BEDROCK_EMBEDDING_MODEL' is not set.")
try:
model_name, aws_access_key, aws_secret_key, region_name = env_value.split(",")
except ValueError:
raise ValueError(
"Environment variable 'BEDROCK_EMBEDDING_MODEL' is improperly formatted. "
"Expected format: 'model_name,aws_access_key,aws_secret_key,region_name'."
)
bedrock_client = boto3.client(
service_name="bedrock-runtime",
region_name=region_name.strip(),
aws_access_key_id=aws_access_key.strip(),
aws_secret_access_key=aws_secret_key.strip(),
)
bedrock_embeddings = BedrockEmbeddings(
model_id=model_name.strip(),
client=bedrock_client
)
return bedrock_embeddings
except Exception as e:
print(f"An unexpected error occurred: {e}")
raise
75 changes: 55 additions & 20 deletions backend/src/shared/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -831,27 +831,62 @@
DELETE_ENTITIES_AND_START_FROM_BEGINNING = "delete_entities_and_start_from_beginning"
START_FROM_LAST_PROCESSED_POSITION = "start_from_last_processed_position"

GRAPH_CLEANUP_PROMPT = """Please consolidate the following list of types into a smaller set of more general, semantically
related types. The consolidated types must be drawn from the original list; do not introduce new types.
Return a JSON object representing the mapping of original types to consolidated types. Every key is the consolidated type
and value is list of the original types that were merged into the consolidated type. Prioritize using the most generic and
repeated term when merging. If a type doesn't merge with any other type, it should still be included in the output,
mapped to itself.

**Input:** A list of strings representing the types to be consolidated. These types may represent either node
labels or relationship labels Your algorithm should do appropriate groupings based on semantic similarity.

Example 1:
Input:
[ "Person", "Human", "People", "Company", "Organization", "Product"]
Output :
[Person": ["Person", "Human", "People"], Organization": ["Company", "Organization"], Product": ["Product"]]

Example 2:
Input :
["CREATED_FOR", "CREATED_TO", "CREATED", "PLACE", "LOCATION", "VENUE"]
GRAPH_CLEANUP_PROMPT = """
You are tasked with organizing a list of types into semantic categories based on their meanings, including synonyms or morphological similarities. The input will include two separate lists: one for **Node Labels** and one for **Relationship Types**. Follow these rules strictly:
### 1. Input Format
The input will include two keys:
- `nodes`: A list of node labels.
- `relationships`: A list of relationship types.
### 2. Grouping Rules
- Group similar items into **semantic categories** based on their meaning or morphological similarities.
- The name of each category must be chosen from the types in the input list (node labels or relationship types). **Do not create or infer new names for categories**.
- Items that cannot be grouped must remain in their own category.
### 3. Naming Rules
- The category name must reflect the grouped items and must be an existing type in the input list.
- Use a widely applicable type as the category name.
- **Do not introduce new names or types** under any circumstances.
### 4. Output Rules
- Return the output as a JSON object with two keys:
- `nodes`: A dictionary where each key represents a category name for nodes, and its value is a list of original node labels in that category.
- `relationships`: A dictionary where each key represents a category name for relationships, and its value is a list of original relationship types in that category.
- Every key and value must come from the provided input lists.
### 5. Examples
#### Example 1:
Input:
{{
"nodes": ["Person", "Human", "People", "Company", "Organization", "Product"],
"relationships": ["CREATED_FOR", "CREATED_TO", "CREATED", "PUBLISHED","PUBLISHED_BY", "PUBLISHED_IN", "PUBLISHED_ON"]
}}
Output in JSON:
{{
"nodes": {{
"Person": ["Person", "Human", "People"],
"Organization": ["Company", "Organization"],
"Product": ["Product"]
}},
"relationships": {{
"CREATED": ["CREATED_FOR", "CREATED_TO", "CREATED"],
"PUBLISHED": ["PUBLISHED_BY", "PUBLISHED_IN", "PUBLISHED_ON"]
}}
}}
#### Example 2: Avoid redundant or incorrect grouping
Input:
{{
"nodes": ["Process", "Process_Step", "Step", "Procedure", "Method", "Natural Process", "Step"],
"relationships": ["USED_FOR", "USED_BY", "USED_WITH", "USED_IN"]
}}
Output:
["CREATED": ["CREATED_FOR", "CREATED_TO", "CREATED"],"PLACE": ["PLACE", "LOCATION", "VENUE"]]
{{
"nodes": {{
"Process": ["Process", "Process_Step", "Step", "Procedure", "Method", "Natural Process"]
}},
"relationships": {{
"USED": ["USED_FOR", "USED_BY", "USED_WITH", "USED_IN"]
}}
}}
### 6. Key Rule
If any item cannot be grouped, it must remain in its own category using its original name. Do not repeat values or create incorrect mappings.
Use these rules to group and name categories accurately without introducing errors or new types.
"""

ADDITIONAL_INSTRUCTIONS = """Your goal is to identify and categorize entities while ensuring that specific data
Expand Down
3 changes: 3 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ services:
- VITE_BATCH_SIZE=${VITE_BATCH_SIZE-2}
- VITE_LLM_MODELS=${VITE_LLM_MODELS-}
- VITE_LLM_MODELS_PROD=${VITE_LLM_MODELS_PROD-openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash}
- VITE_AUTH0_DOMAIN=${VITE_AUTH0_DOMAIN-}
- VITE_AUTH0_CLIENT_ID=${VITE_AUTH0_CLIENT_ID-}
- VITE_SKIP_AUTH=$VITE_SKIP_AUTH-true}
- DEPLOYMENT_ENV=local
volumes:
- ./frontend:/app
Expand Down
6 changes: 6 additions & 0 deletions frontend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ ARG VITE_CHAT_MODES=""
ARG VITE_ENV="DEV"
ARG VITE_BATCH_SIZE=2
ARG VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
ARG VITE_AUTH0_CLIENT_ID=""
ARG VITE_AUTH0_DOMAIN=""
ARG VITE_SKIP_AUTH="false"

WORKDIR /app
COPY package.json yarn.lock ./
Expand All @@ -30,6 +33,9 @@ RUN VITE_BACKEND_API_URL=$VITE_BACKEND_API_URL \
VITE_BATCH_SIZE=$VITE_BATCH_SIZE \
VITE_LLM_MODELS=$VITE_LLM_MODELS \
VITE_LLM_MODELS_PROD=$VITE_LLM_MODELS_PROD \
VITE_AUTH0_CLIENT_ID=$VITE_AUTH0_CLIENT_ID \
VITE_AUTH0_DOMAIN=$VITE_AUTH0_DOMAIN \
VITE_SKIP_AUTH=$VITE_SKIP_AUTH \
yarn run build

# Step 2: Serve the application using Nginx
Expand Down
3 changes: 3 additions & 0 deletions frontend/example.env
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,6 @@ VITE_BATCH_SIZE=2
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
VITE_FRONTEND_HOSTNAME="localhost:8080"
VITE_SEGMENT_API_URL=""
VITE_AUTH0_CLIENT_ID=""
VITE_AUTH0_DOMAIN=""
VITE_SKIP_AUTH=true
1 change: 1 addition & 0 deletions frontend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"preview": "vite preview"
},
"dependencies": {
"@auth0/auth0-react": "^2.2.4",
"@emotion/styled": "^11.11.0",
"@mui/material": "^5.15.10",
"@mui/styled-engine": "^5.15.9",
Expand Down
Loading
Loading