Skip to content

DEV to STAGING #732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
271e8d5
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Aug 20, 2024
e078e21
connection _check
prakriti-solankey Aug 20, 2024
7c66bf2
Fix typo: correct 'josn_obj' to 'json_obj' (#697)
destiny966113 Aug 20, 2024
477fda0
lint fixes
kartikpersistent Aug 20, 2024
bacbcfc
connection _check
prakriti-solankey Aug 20, 2024
885c345
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Aug 20, 2024
2fc3e13
Chatbot changes (#700)
vasanthasaikalluri Aug 21, 2024
c0cca99
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Aug 21, 2024
279d820
fixed issue delete entities return count
praveshkumar1988 Aug 21, 2024
30f92cd
removed specified version due to dependency clashes between versions
praveshkumar1988 Aug 21, 2024
832afd9
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
praveshkumar1988 Aug 21, 2024
edcff3b
updated script"integration test cases"
abhishekkumar-27 Aug 21, 2024
71ed29a
decreased the delay for pollintg API
kartikpersistent Aug 21, 2024
7eb7605
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 21, 2024
1af5877
Graph enhancements (#696)
prakriti-solankey Aug 21, 2024
4575198
changed chat mode names (#702)
vasanthasaikalluri Aug 21, 2024
2d76462
Merge branch 'STAGING' into DEV
prakriti-solankey Aug 21, 2024
a6ee345
env changes
kartikpersistent Aug 21, 2024
dc351a0
used axios instance for network calls
kartikpersistent Aug 22, 2024
a241c6b
disabled the toolip when dropdown is open state
kartikpersistent Aug 22, 2024
426906b
format fixes + chat mode naming changes
kartikpersistent Aug 22, 2024
78ec3fd
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Aug 22, 2024
7d0b431
mode added to info model for entities
prakriti-solankey Aug 22, 2024
e60a6e3
Merge branch 'STAGING' into DEV
prakriti-solankey Aug 22, 2024
f2b1e17
Issue fixed, List out of index while getting status of dicuement node
praveshkumar1988 Aug 26, 2024
94c493e
processing count updated on cancel
kartikpersistent Aug 26, 2024
489b5ae
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 26, 2024
3ef88b6
format fixes
kartikpersistent Aug 27, 2024
4e2f909
Merge branch 'STAGING' into DEV
kartikpersistent Aug 27, 2024
dadaa28
remove whitespace for enviroment variable which due to an error "xxx …
edenbuaa Aug 27, 2024
4c6f676
updated disconnected nodes
abhishekkumar-27 Aug 27, 2024
568db51
updated disconnected nodes
abhishekkumar-27 Aug 27, 2024
501ec6b
fix: Processed count update on failed condition
kartikpersistent Aug 28, 2024
9941474
added disconnected and up nodes
abhishekkumar-27 Aug 28, 2024
cac1963
resetting the alert message on success scenario
kartikpersistent Aug 29, 2024
fd7a4bb
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 29, 2024
d010e41
populate graph schema
abhishekkumar-27 Aug 30, 2024
b3a00ac
not clearing the password when there is error scenario
kartikpersistent Aug 30, 2024
77b06db
fixed the vector index loading issue
kartikpersistent Aug 30, 2024
d166290
fix: empty credentials payload for recreate vector index api
kartikpersistent Aug 30, 2024
088eda2
chatbot status (#676)
prakriti-solankey Sep 2, 2024
c159322
Configuration change. Update LLM models and remove --preload from doc…
praveshkumar1988 Sep 3, 2024
e08aeab
Retry processing (#698)
kartikpersistent Sep 4, 2024
0cb7ed9
ref added for keydown (#717)
prakriti-solankey Sep 4, 2024
18b0093
Remove total_pages propert. It is not used in DB. (#714)
praveshkumar1988 Sep 4, 2024
eb48190
Update main.py
aashipandya Sep 4, 2024
5ff0014
Add print statement for document status
praveshkumar1988 Sep 5, 2024
6ba42e0
allow credentials true changes
kartikpersistent Sep 5, 2024
771753b
reset the values to 0 when the retry option is start from begining
kartikpersistent Sep 5, 2024
d56a60d
Update Node/Document status using SSE, Trying to fix Cancelled by Sco…
praveshkumar1988 Sep 5, 2024
949d2f9
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
praveshkumar1988 Sep 5, 2024
ba13144
resetting the nodescount and relationshipcount
kartikpersistent Sep 6, 2024
4cc8104
Add vector index exist condition to create
praveshkumar1988 Sep 6, 2024
122f6a6
Merge branch 'STAGING' into DEV
aashipandya Sep 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ RUN pip install -r requirements.txt
# Copy application code
COPY . /code
# Set command
CMD ["gunicorn", "score:app", "--workers", "8","--preload","--threads", "8", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
CMD ["gunicorn", "score:app", "--workers", "8","--threads", "8", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
1 change: 1 addition & 0 deletions backend/Performance_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ def performance_main():
for _ in range(CONCURRENT_REQUESTS):
futures.append(executor.submit(post_request_chunk))

# Chatbot request futures
# Chatbot request futures
# for message in CHATBOT_MESSAGES:
# futures.append(executor.submit(chatbot_request, message))
Expand Down
74 changes: 45 additions & 29 deletions backend/score.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ def sick():
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Expand Down Expand Up @@ -137,7 +136,8 @@ async def extract_knowledge_graph_from_file(
allowedNodes=Form(None),
allowedRelationship=Form(None),
language=Form(None),
access_token=Form(None)
access_token=Form(None),
retry_condition=Form(None)
):
"""
Calls 'extract_graph_from_file' in a new thread to create Neo4jGraph from a
Expand All @@ -161,30 +161,30 @@ async def extract_knowledge_graph_from_file(
merged_file_path = os.path.join(MERGED_DIR,file_name)
logging.info(f'File path:{merged_file_path}')
result = await asyncio.to_thread(
extract_graph_from_file_local_file, uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship)
extract_graph_from_file_local_file, uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 's3 bucket' and source_url:
result = await asyncio.to_thread(
extract_graph_from_file_s3, uri, userName, password, database, model, source_url, aws_access_key_id, aws_secret_access_key, allowedNodes, allowedRelationship)
extract_graph_from_file_s3, uri, userName, password, database, model, source_url, aws_access_key_id, aws_secret_access_key, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'web-url':
result = await asyncio.to_thread(
extract_graph_from_web_page, uri, userName, password, database, model, source_url, allowedNodes, allowedRelationship)
extract_graph_from_web_page, uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'youtube' and source_url:
result = await asyncio.to_thread(
extract_graph_from_file_youtube, uri, userName, password, database, model, source_url, allowedNodes, allowedRelationship)
extract_graph_from_file_youtube, uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'Wikipedia' and wiki_query:
result = await asyncio.to_thread(
extract_graph_from_file_Wikipedia, uri, userName, password, database, model, wiki_query, max_sources, language, allowedNodes, allowedRelationship)
extract_graph_from_file_Wikipedia, uri, userName, password, database, model, wiki_query, language, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'gcs bucket' and gcs_bucket_name:
result = await asyncio.to_thread(
extract_graph_from_file_gcs, uri, userName, password, database, model, gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token, allowedNodes, allowedRelationship)
extract_graph_from_file_gcs, uri, userName, password, database, model, gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token, file_name, allowedNodes, allowedRelationship, retry_condition)
else:
return create_api_response('Failed',message='source_type is other than accepted source')

if result is not None:
result['db_url'] = uri
result['api_name'] = 'extract'
Expand Down Expand Up @@ -433,25 +433,25 @@ async def generate():
logging.info(" SSE Client disconnected")
break
# get the current status of document node
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
if result is not None:
status = json.dumps({'fileName':file_name,
'status':result[0]['Status'],
'processingTime':result[0]['processingTime'],
'nodeCount':result[0]['nodeCount'],
'relationshipCount':result[0]['relationshipCount'],
'model':result[0]['model'],
'total_chunks':result[0]['total_chunks'],
'total_pages':result[0]['total_pages'],
'fileSize':result[0]['fileSize'],
'processed_chunk':result[0]['processed_chunk'],
'fileSource':result[0]['fileSource']
})

else:
status = json.dumps({'fileName':file_name, 'status':'Failed'})
yield status
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
print(f'Result of document status in SSE : {result}')
if len(result) > 0:
status = json.dumps({'fileName':file_name,
'status':result[0]['Status'],
'processingTime':result[0]['processingTime'],
'nodeCount':result[0]['nodeCount'],
'relationshipCount':result[0]['relationshipCount'],
'model':result[0]['model'],
'total_chunks':result[0]['total_chunks'],
'fileSize':result[0]['fileSize'],
'processed_chunk':result[0]['processed_chunk'],
'fileSource':result[0]['fileSource']
})
yield status
except asyncio.CancelledError:
logging.info("SSE Connection cancelled")

Expand Down Expand Up @@ -495,21 +495,21 @@ async def get_document_status(file_name, url, userName, password, database):
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
if result is not None:
if len(result) > 0:
status = {'fileName':file_name,
'status':result[0]['Status'],
'processingTime':result[0]['processingTime'],
'nodeCount':result[0]['nodeCount'],
'relationshipCount':result[0]['relationshipCount'],
'model':result[0]['model'],
'total_chunks':result[0]['total_chunks'],
'total_pages':result[0]['total_pages'],
'fileSize':result[0]['fileSize'],
'processed_chunk':result[0]['processed_chunk'],
'fileSource':result[0]['fileSource']
}
else:
status = {'fileName':file_name, 'status':'Failed'}
print(f'Result of document status in refresh : {result}')
return create_api_response('Success',message="",file_name=status)
except Exception as e:
message=f"Unable to get the document status"
Expand Down Expand Up @@ -626,6 +626,22 @@ async def merge_duplicate_nodes(uri=Form(), userName=Form(), password=Form(), da
return create_api_response(job_status, message=message, error=error_message)
finally:
gc.collect()

@app.post("/retry_processing")
async def retry_processing(uri=Form(), userName=Form(), password=Form(), database=Form(), file_name=Form(), retry_condition=Form()):
try:
graph = create_graph_database_connection(uri, userName, password, database)
await asyncio.to_thread(set_status_retry, graph,file_name,retry_condition)
#set_status_retry(graph,file_name,retry_condition)
return create_api_response('Success',message=f"Status set to Reprocess for filename : {file_name}")
except Exception as e:
job_status = "Failed"
message="Unable to set status to Retry"
error_message = str(e)
logging.exception(f'{error_message}')
return create_api_response(job_status, message=message, error=error_message)
finally:
gc.collect()

if __name__ == "__main__":
uvicorn.run(app)
1 change: 0 additions & 1 deletion backend/src/document_sources/gcs_bucket.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ def merge_file_gcs(bucket_name, original_file_name: str, folder_name_sha1_hashed
blob.upload_from_file(file_io)
# pdf_reader = PdfReader(file_io)
file_size = len(merged_file)
# total_pages = len(pdf_reader.pages)

return file_size
except Exception as e:
Expand Down
16 changes: 8 additions & 8 deletions backend/src/document_sources/local_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,19 +56,19 @@ def get_pages_with_page_numbers(unstructured_pages):
if page.metadata['page_number']==page_number:
page_content += page.page_content
metadata = {'source':page.metadata['source'],'page_number':page_number, 'filename':page.metadata['filename'],
'filetype':page.metadata['filetype'], 'total_pages':unstructured_pages[-1].metadata['page_number']}
'filetype':page.metadata['filetype']}

if page.metadata['page_number']>page_number:
page_number+=1
if not metadata:
metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content, metadata=metadata))
# if not metadata:
# metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content))
page_content=''

if page == unstructured_pages[-1]:
if not metadata:
metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content, metadata=metadata))
# if not metadata:
# metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content))

elif page.metadata['category']=='PageBreak' and page!=unstructured_pages[0]:
page_number+=1
Expand All @@ -80,7 +80,7 @@ def get_pages_with_page_numbers(unstructured_pages):
page_content += page.page_content
metadata_with_custom_page_number = {'source':page.metadata['source'],
'page_number':1, 'filename':page.metadata['filename'],
'filetype':page.metadata['filetype'], 'total_pages':1}
'filetype':page.metadata['filetype']}
if page == unstructured_pages[-1]:
pages.append(Document(page_content = page_content, metadata=metadata_with_custom_page_number))
return pages
2 changes: 1 addition & 1 deletion backend/src/document_sources/wikipedia.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

def get_documents_from_Wikipedia(wiki_query:str, language:str):
try:
pages = WikipediaLoader(query=wiki_query.strip(), lang=language, load_max_docs=1, load_all_available_meta=False).load()
pages = WikipediaLoader(query=wiki_query.strip(), lang=language, load_all_available_meta=False).load()
file_name = wiki_query.strip()
logging.info(f"Total Pages from Wikipedia = {len(pages)}")
return file_name, pages
Expand Down
2 changes: 1 addition & 1 deletion backend/src/entities/source_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ class sourceNode:
updated_at:datetime=None
processing_time:float=None
error_message:str=None
total_pages:int=None
total_chunks:int=None
language:str=None
is_cancelled:bool=None
processed_chunk:int=None
access_token:str=None
retry_condition:str=None
20 changes: 10 additions & 10 deletions backend/src/graphDB_dataAccess.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ def create_source_node(self, obj_source_node:sourceNode):
d.processingTime = $pt, d.errorMessage = $e_message, d.nodeCount= $n_count,
d.relationshipCount = $r_count, d.model= $model, d.gcsBucket=$gcs_bucket,
d.gcsBucketFolder= $gcs_bucket_folder, d.language= $language,d.gcsProjectId= $gcs_project_id,
d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0, d.total_pages=$total_pages,
d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0,
d.access_token=$access_token""",
{"fn":obj_source_node.file_name, "fs":obj_source_node.file_size, "ft":obj_source_node.file_type, "st":job_status,
"url":obj_source_node.url,
"awsacc_key_id":obj_source_node.awsAccessKeyId, "f_source":obj_source_node.file_source, "c_at":obj_source_node.created_at,
"u_at":obj_source_node.created_at, "pt":0, "e_message":'', "n_count":0, "r_count":0, "model":obj_source_node.model,
"gcs_bucket": obj_source_node.gcsBucket, "gcs_bucket_folder": obj_source_node.gcsBucketFolder,
"language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId, "total_pages": obj_source_node.total_pages,
"language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId,
"access_token":obj_source_node.access_token})
except Exception as e:
error_message = str(e)
Expand All @@ -71,26 +71,26 @@ def update_source_node(self, obj_source_node:sourceNode):
if obj_source_node.processing_time is not None and obj_source_node.processing_time != 0:
params['processingTime'] = round(obj_source_node.processing_time.total_seconds(),2)

if obj_source_node.node_count is not None and obj_source_node.node_count != 0:
if obj_source_node.node_count is not None :
params['nodeCount'] = obj_source_node.node_count

if obj_source_node.relationship_count is not None and obj_source_node.relationship_count != 0:
if obj_source_node.relationship_count is not None :
params['relationshipCount'] = obj_source_node.relationship_count

if obj_source_node.model is not None and obj_source_node.model != '':
params['model'] = obj_source_node.model

if obj_source_node.total_pages is not None and obj_source_node.total_pages != 0:
params['total_pages'] = obj_source_node.total_pages

if obj_source_node.total_chunks is not None and obj_source_node.total_chunks != 0:
params['total_chunks'] = obj_source_node.total_chunks

if obj_source_node.is_cancelled is not None and obj_source_node.is_cancelled != False:
if obj_source_node.is_cancelled is not None:
params['is_cancelled'] = obj_source_node.is_cancelled

if obj_source_node.processed_chunk is not None and obj_source_node.processed_chunk != 0:
if obj_source_node.processed_chunk is not None :
params['processed_chunk'] = obj_source_node.processed_chunk

if obj_source_node.retry_condition is not None :
params['retry_condition'] = obj_source_node.retry_condition

param= {"props":params}

Expand Down Expand Up @@ -187,7 +187,7 @@ def get_current_status_document_node(self, file_name):
query = """
MATCH(d:Document {fileName : $file_name}) RETURN d.status AS Status , d.processingTime AS processingTime,
d.nodeCount AS nodeCount, d.model as model, d.relationshipCount as relationshipCount,
d.total_pages AS total_pages, d.total_chunks AS total_chunks , d.fileSize as fileSize,
d.total_chunks AS total_chunks , d.fileSize as fileSize,
d.is_cancelled as is_cancelled, d.processed_chunk as processed_chunk, d.fileSource as fileSource
"""
param = {"file_name" : file_name}
Expand Down
Loading