Skip to content

Commit 8158d29

Browse files
praveshkumar1988aashipandyaabhishekkumar-27kartikpersistentvasanthasaikalluri
committed
Dev (#433) (#448)
* Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- * recent merges * pdf deletion due to out of diskspace * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * Convert is_cancelled value from string to bool * added the default page size * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- * offset in chunks (#389) * page number in gcs loader (#393) * added youtube timestamps (#392) * chat pop up button (#387) * expand * minimize-icon * css changes * chat history * chatbot wider Side Nav * expand icon * chatbot UI * Delete * merge fixes * code suggestions --------- * chunks create before extraction using is_pre_process variable (#383) * chunks create before extraction using is_pre_process variable * Return total pages for Model * update requirement.txt * total pages on uplaod API * added the Confirmation Dialog * added the selected files into the confirmation modal * format and lint fixes * added the stop watch image * fileselection on alert dialog * Add timeout in docker for gunicorn workers * Add cancel icon to info popup (#384) * Info Modal Changes * css changes * recent merges * Integration_qa test (#375) * Test IntegrationQA added * update test cases * update test * update node count assertions * test changes * update changes * modification test * Code refatctor test cases * Handle allowedlist issue in test * test changes * update test * test case execution * test chatbot updates * test case update file * added file --------- * fixed status blank issue * Rendering the file name instead of link for gcs and s3 sources in the info modal * added the default page size * Convert is_cancelled value from string to bool * Issue fixed Processed chunked as 0 when file re-process again * Youtube timestamps (#386) * Wikipedia source to accept all valid urls * wikipedia url to support multiple languages * integrated wiki langauge param for extract api * Youtube video timestamps --------- * groq llm integration backend (#286) * groq llm integration backend * groq and description in node properties * added groq in options --------- * Save Total Pages in DB * Added total Pages * file selection when we didn't select anything from Main table * added the danger icon only for large files * added the overflow for more files and file selection for all new files * moved the interface to types * added the icon accoroding to the source * set total page for wiki and youtube * h3 heading * merge * updated the alert on basis if total pages * deleted chunks * polling based on total pages * isNan check * large file based on file size for s3 and gcs * file source in server side event * time calculation based on chunks for gcs and s3 --------- * fixed the layout issue * Populate graph schema (#399) * crreate new endpoint populate_graph_schema and update the query for getting lables from DB * Added main.py changes * conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396) * added the condtion * removed llms * Fixed issue : Remove extra unused param * get emb only if used (#278) * Chatbot chunks (#402) * Added file name to the content sent to LLM * added chunk text in the response * increased the docs parts sent to llm * Modified graph query * mardown rendering * youtube starttime * icons * offset changes * removed the files due to codespace space issue --------- * Settings modal to support generating the labels from the llm by using text given by user (#405) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * fixed css issue * fixed status blank issue * Modified response when no docs is retrived (#413) * Fixed env/docker-compose for local deployments + README doc (#410) * Fixed env/docker-compose for local deployments + README doc * wrong place for ENV in README * by default, removed langsmith + fixed knn score string to float * by default, removed langsmith + fixed knn score string to float * Fixed strings in docker-compose env * Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop) * Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that * Support for all unstructured files (#401) * all unstructured files * responsiveness * added file type * added the extensions * spell mistake * ppt file changes --------- * Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415) * added the json * added schema from text dialog * integrated the schemaAPI * added the alert * resize fixes * Extract schema using direct ChatOpenAI API and Chain * integrated the checkbox for schema to text dialog * Update SettingModal.tsx --------- * gcs file content read via storage client (#417) * gcs file content read via storage client * added the access token the file state --------- * pypdf2 to read files from gcs (#420) * 407 remove driver from frontend (#416) * removed driver * removed API * connecting to database on page refresh --------- * Css handling of info modal and Tooltips (#418) * css change * toolTips * Sidebar Tooltips * copy to clip * css change * added image types * added gcs * type fix * docker changes * speech * added the toolip for dropzone sources --------- * Fixed retrival bugs (#421) * yarn format fixes * changed the delete message * added the cancel button * changed the message on tooltip * added space * UI fixes * tooltip for setting * updated req * wikipedia URL input (#424) * accept only wikipedia links * added wikipedia link * added wikilink regex * wikipedia single url only * changed the alert message * wording change * pushed validation state persist error --------- * speech and copy (#422) * speech and copy * startTime * added chunk properties * tooltips --------- * Fixed issue for out of range in KNN API * solved conflicts * conflict solved * Remove logging info from update KNN API * tooltip changes * format and lint fixes * responsiveness changes * Fixed issue for total pages GCS, S3 * UI polishing (#428) * button and tooltip changes * checking validation on change * settings module populate fix * format fixes * opening the modal after auth success * removed the limit * added the scrobar for dropdowns * speech state (#426) * speech state * Button Details changes * delete wording change * Total pages in buckets (#431) * page number NA for buckets * added N/A for gcs and s3 pages * total pages for gcs * remove unwanted logger --------- * removed the max width * Update FileTable.tsx * Update the docker file * Modified prompt (#438) * Update Dockerfile * Update Dockerfile * Update Dockerfile * rendering Fix * Local file upload gcs (#442) * Uplaod file to GCS * GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled * Add life cycle rule on uploaded bucket * pdf upload local and gcs bucket check * delete files when processed and extract changes --------- * Modified chat length and entities used (#443) * metadata for unstructured files (#446) * Unstructured file metadata (#447) * metadata for unstructured files * sleep in gcs upload * updated * icons added to chunks (#435) * icons added to chunks * info modal icons --------- Co-authored-by: aashipandya <156318202+aashipandya@users.noreply.github.com> Co-authored-by: abhishekkumar-27 <164544129+abhishekkumar-27@users.noreply.github.com> Co-authored-by: kartikpersistent <101251502+kartikpersistent@users.noreply.github.com> Co-authored-by: vasanthasaikalluri <165021735+vasanthasaikalluri@users.noreply.github.com> Co-authored-by: Prakriti Solankey <156313631+prakriti-solankey@users.noreply.github.com> Co-authored-by: Ajay Meena <meenajy1996@gmail.com> Co-authored-by: Morgan Senechal <morgan@neo4j.com> Co-authored-by: karanchellani <142801957+karanchellani@users.noreply.github.com>
1 parent bdc2f29 commit 8158d29

File tree

15 files changed

+312
-204
lines changed

15 files changed

+312
-204
lines changed

backend/Dockerfile

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ EXPOSE 8000
66
RUN apt-get update && \
77
apt-get install -y --no-install-recommends \
88
libgl1-mesa-glx \
9-
libreoffice \
109
cmake \
1110
poppler-utils \
1211
tesseract-ocr && \
@@ -19,13 +18,5 @@ COPY requirements.txt /code/
1918
RUN pip install --no-cache-dir --upgrade -r requirements.txt
2019
# Copy application code
2120
COPY . /code
22-
RUN apt-get update \
23-
&& apt-get install -y libgl1-mesa-glx cmake \
24-
&& apt-get install -y poppler-utils \
25-
&& apt install -y tesseract-ocr \
26-
&& export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH \
27-
&& pip install --no-cache-dir --upgrade -r /code/requirements.txt
28-
29-
# CMD ["uvicorn", "score:app", "--host", "0.0.0.0", "--port", "8000","--workers", "4"]
30-
CMD ["gunicorn", "score:app","--workers","4","--worker-class","uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
31-
21+
# Set command
22+
CMD ["gunicorn", "score:app", "--workers", "2", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]

backend/example.env

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,4 @@ LANGCHAIN_API_KEY = ""
2020
LANGCHAIN_PROJECT = ""
2121
LANGCHAIN_TRACING_V2 = ""
2222
LANGCHAIN_ENDPOINT = ""
23-
NUMBER_OF_CHUNKS_TO_COMBINE = ""
24-
# NUMBER_OF_CHUNKS_ALLOWED = ""
25-
# Enable Gemini (default is True)
26-
GEMINI_ENABLED = True|False
27-
# Enable Google Cloud logs (default is True)
28-
GCP_LOG_METRICS_ENABLED = True|False
29-
UPDATE_GRAPH_CHUNKS_PROCESSED = 20
30-
NEO4J_USER_AGENT = ""
31-
UPDATE_GRAPH_CHUNKS_PROCESSED = 20
23+
GCS_FILE_CACHE = "" #save the file into GCS or local, SHould be True or False

backend/score.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ async def extract_knowledge_graph_from_file(
168168
graphDb_data_Access = graphDBdataAccess(graph)
169169
if source_type == 'local file':
170170
result = await asyncio.to_thread(
171-
extract_graph_from_file_local_file, graph, model, merged_file_path, file_name, allowedNodes, allowedRelationship, uri)
171+
extract_graph_from_file_local_file, graph, model, merged_file_path, file_name, allowedNodes, allowedRelationship)
172172

173173
elif source_type == 's3 bucket' and source_url:
174174
result = await asyncio.to_thread(
@@ -198,12 +198,7 @@ async def extract_knowledge_graph_from_file(
198198
graphDb_data_Access.update_exception_db(file_name,error_message)
199199
gcs_file_cache = os.environ.get('GCS_FILE_CACHE')
200200
if source_type == 'local file':
201-
if gcs_file_cache == 'True':
202-
folder_name = create_gcs_bucket_folder_name_hashed(uri,file_name)
203-
delete_file_from_gcs(BUCKET_UPLOAD,folder_name,file_name)
204-
else:
205-
logging.info(f'Deleted File Path: {merged_file_path} and Deleted File Name : {file_name}')
206-
delete_uploaded_local_file(merged_file_path,file_name)
201+
delete_file_from_gcs(BUCKET_UPLOAD,file_name)
207202
josn_obj = {'message':message,'error_message':error_message, 'file_name': file_name,'status':'Failed','db_url':uri,'failed_count':1, 'source_type': source_type}
208203
logger.log_struct(josn_obj)
209204
logging.exception(f'File Failed in extraction: {josn_obj}')
@@ -350,8 +345,14 @@ async def upload_large_file_into_chunks(file:UploadFile = File(...), chunkNumber
350345
originalname=Form(None), model=Form(None), uri=Form(None), userName=Form(None),
351346
password=Form(None), database=Form(None)):
352347
try:
353-
result = await asyncio.to_thread(upload_file,uri,userName,password,database,model,file,chunkNumber,totalChunks,originalname)
354-
return create_api_response('Success', message=result)
348+
graph = create_graph_database_connection(uri, userName, password, database)
349+
result = await asyncio.to_thread(upload_file, graph, model, file, chunkNumber, totalChunks, originalname, uri, CHUNK_DIR, MERGED_DIR)
350+
josn_obj = {'api_name':'upload','db_url':uri}
351+
logger.log_struct(josn_obj)
352+
if int(chunkNumber) == int(totalChunks):
353+
return create_api_response('Success',data=result, message='Source Node Created Successfully')
354+
else:
355+
return create_api_response('Success', message=result)
355356
except Exception as e:
356357
job_status = "Failed"
357358
message="Unable to upload large file into chunks or saving the chunks"

backend/src/QA_integration_new.py

Lines changed: 23 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
MATCH (chunk)-[:PART_OF]->(d:Document)
3939
CALL { WITH chunk
4040
MATCH (chunk)-[:HAS_ENTITY]->(e)
41-
MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,3}(:!Chunk&!Document)
41+
MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,2}(:!Chunk&!Document)
4242
UNWIND rels as r
4343
RETURN collect(distinct r) as rels
4444
}
@@ -49,23 +49,27 @@
4949
WITH d, score,
5050
apoc.text.join(texts,"\n----\n") +
5151
apoc.text.join(entities,"\n")
52-
as text, entities, chunkIds, page_numbers
53-
RETURN text, score, {source: COALESCE(CASE WHEN d.url CONTAINS "None" THEN d.fileName ELSE d.url END, d.fileName), chunkIds:chunkIds, page_numbers:page_numbers} as metadata
52+
as text, entities, chunkIds, page_numbers ,start_times
53+
RETURN text, score, {source: COALESCE(CASE WHEN d.url CONTAINS "None" THEN d.fileName ELSE d.url END, d.fileName), chunkIds:chunkIds, page_numbers:page_numbers,start_times:start_times,entities:entities} as metadata
5454
"""
5555

5656
SYSTEM_TEMPLATE = """
57-
You are an AI-powered question-answering agent. Your task is to provide accurate and concise responses to user queries based on the given context, chat history, and available resources.
57+
You are an AI-powered question-answering agent. Your task is to provide accurate and comprehensive responses to user queries based on the given context, chat history, and available resources.
5858
5959
### Response Guidelines:
60-
1. **Direct Answers**: Provide straightforward answers to the user's queries without headers unless requested. Avoid speculative responses.
60+
1. **Direct Answers**: Provide clear and thorough answers to the user's queries without headers unless requested. Avoid speculative responses.
6161
2. **Utilize History and Context**: Leverage relevant information from previous interactions, the current user input, and the context provided below.
6262
3. **No Greetings in Follow-ups**: Start with a greeting in initial interactions. Avoid greetings in subsequent responses unless there's a significant break or the chat restarts.
6363
4. **Admit Unknowns**: Clearly state if an answer is unknown. Avoid making unsupported statements.
6464
5. **Avoid Hallucination**: Only provide information based on the context provided. Do not invent information.
65-
6. **Response Length**: Keep responses concise and relevant. Aim for clarity and completeness within 2-3 sentences unless more detail is requested.
65+
6. **Response Length**: Keep responses concise and relevant. Aim for clarity and completeness within 4-5 sentences unless more detail is requested.
6666
7. **Tone and Style**: Maintain a professional and informative tone. Be friendly and approachable.
6767
8. **Error Handling**: If a query is ambiguous or unclear, ask for clarification rather than providing a potentially incorrect answer.
6868
9. **Fallback Options**: If the required information is not available in the provided context, provide a polite and helpful response. Example: "I don't have that information right now." or "I'm sorry, but I don't have that information. Is there something else I can help with?"
69+
10. **Context Availability**: If the context is empty, do not provide answers based solely on internal knowledge. Instead, respond appropriately by indicating the lack of information.
70+
71+
72+
**IMPORTANT** : DO NOT ANSWER FROM YOUR KNOWLEDGE BASE USE THE BELOW CONTEXT
6973
7074
### Context:
7175
<context>
@@ -77,15 +81,18 @@
7781
AI Response: 'Hello there! How can I assist you today?'
7882
7983
User: "What is Langchain?"
80-
AI Response: "Langchain is a framework that enables the development of applications powered by large language models, such as chatbots."
84+
AI Response: "Langchain is a framework that enables the development of applications powered by large language models, such as chatbots. It simplifies the integration of language models into various applications by providing useful tools and components."
8185
8286
User: "Can you explain how to use memory management in Langchain?"
83-
AI Response: "Langchain's memory management involves utilizing built-in mechanisms to manage conversational context effectively, ensuring a coherent user experience."
87+
AI Response: "Langchain's memory management involves utilizing built-in mechanisms to manage conversational context effectively. It ensures that the conversation remains coherent and relevant by maintaining the history of interactions and using it to inform responses."
8488
8589
User: "I need help with PyCaret's classification model."
86-
AI Response: "PyCaret simplifies the process of building and deploying machine learning models. For classification tasks, you can use PyCaret's setup function to prepare your data, then compare and tune models."
90+
AI Response: "PyCaret simplifies the process of building and deploying machine learning models. For classification tasks, you can use PyCaret's setup function to prepare your data. After setup, you can compare multiple models to find the best one, and then fine-tune it for better performance."
8791
88-
Note: This system does not generate answers based solely on internal knowledge. It answers from the information provided in the user's current and previous inputs, and from explicitly referenced external sources.
92+
User: "What can you tell me about the latest realtime trends in AI?"
93+
AI Response: "I don't have that information right now. Is there something else I can help with?"
94+
95+
Note: This system does not generate answers based solely on internal knowledge. It answers from the information provided in the user's current and previous inputs, and from the context.
8996
"""
9097

9198
# def get_llm(model: str,max_tokens=CHAT_MAX_TOKENS) -> Any:
@@ -316,27 +323,12 @@ def QA_RAG(graph,model,question,session_id):
316323
"messages":messages
317324
}
318325
)
319-
formatted_docs,sources = format_documents(docs)
320-
doc_retrieval_time = time.time() - start_time
321-
logging.info(f"Modified question and Documents retrieved in {doc_retrieval_time:.2f} seconds")
322-
323-
start_time = time.time()
324-
rag_chain = get_rag_chain(llm=llm)
325-
ai_response = rag_chain.invoke(
326-
{
327-
"messages" : messages[:-1],
328-
"context" : formatted_docs,
329-
"input" : question
330-
}
331-
)
332-
result = get_sources_and_chunks(sources,docs)
333-
content = ai_response.content
334-
if "Gemini" in model:
335-
total_tokens = ai_response.response_metadata['usage_metadata']['prompt_token_count']
336-
else:
337-
total_tokens = ai_response.response_metadata['token_usage']['total_tokens']
338-
predict_time = time.time() - start_time
339-
logging.info(f"Final Response predicted in {predict_time:.2f} seconds")
326+
if docs:
327+
# print(docs)
328+
formatted_docs,sources = format_documents(docs)
329+
330+
doc_retrieval_time = time.time() - start_time
331+
logging.info(f"Modified question and Documents retrieved in {doc_retrieval_time:.2f} seconds")
340332

341333
start_time = time.time()
342334
messages.append(ai_response)

backend/src/chunkid_entities.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
MATCH (chunk)-[:PART_OF]->(d:Document)
1111
CALL {WITH chunk
1212
MATCH (chunk)-[:HAS_ENTITY]->(e)
13-
MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,3}(:!Chunk&!Document)
13+
MATCH path=(e)(()-[rels:!HAS_ENTITY&!PART_OF]-()){0,2}(:!Chunk&!Document)
1414
UNWIND rels as r
1515
RETURN collect(distinct r) as rels
1616
}

backend/src/document_sources/gcs_bucket.py

Lines changed: 90 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
import os
22
import logging
33
from google.cloud import storage
4-
from langchain_community.document_loaders import GCSFileLoader
4+
from langchain_community.document_loaders import GCSFileLoader, GCSDirectoryLoader
5+
from langchain_community.document_loaders import PyMuPDFLoader
6+
from langchain_core.documents import Document
7+
from PyPDF2 import PdfReader
8+
import io
9+
from google.oauth2.credentials import Credentials
10+
import time
511

612
def get_gcs_bucket_files_info(gcs_project_id, gcs_bucket_name, gcs_bucket_folder, creds):
713
storage_client = storage.Client(project=gcs_project_id, credentials=creds)
@@ -36,7 +42,7 @@ def get_gcs_bucket_files_info(gcs_project_id, gcs_bucket_name, gcs_bucket_folder
3642
def load_pdf(file_path):
3743
return PyMuPDFLoader(file_path)
3844

39-
def get_documents_from_gcs(gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename):
45+
def get_documents_from_gcs(gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token=None):
4046

4147
if gcs_bucket_folder is not None:
4248
if gcs_bucket_folder.endswith('/'):
@@ -47,8 +53,86 @@ def get_documents_from_gcs(gcs_project_id, gcs_bucket_name, gcs_bucket_folder, g
4753
blob_name = gcs_blob_filename
4854
#credentials, project_id = google.auth.default()
4955
logging.info(f"GCS project_id : {gcs_project_id}")
50-
loader = GCSFileLoader(project_name=gcs_project_id, bucket=gcs_bucket_name, blob=blob_name)
51-
pages = loader.load()
52-
file_name = gcs_blob_filename
53-
return file_name, pages
56+
#loader = GCSFileLoader(project_name=gcs_project_id, bucket=gcs_bucket_name, blob=blob_name, loader_func=load_pdf)
57+
# pages = loader.load()
58+
# file_name = gcs_blob_filename
59+
#creds= Credentials(access_token)
60+
if access_token is None:
61+
storage_client = storage.Client(project=gcs_project_id)
62+
else:
63+
creds= Credentials(access_token)
64+
storage_client = storage.Client(project=gcs_project_id, credentials=creds)
65+
print(f'BLOB Name: {blob_name}')
66+
bucket = storage_client.bucket(gcs_bucket_name)
67+
blob = bucket.blob(blob_name)
68+
content = blob.download_as_bytes()
69+
pdf_file = io.BytesIO(content)
70+
pdf_reader = PdfReader(pdf_file)
71+
72+
# Extract text from all pages
73+
text = ""
74+
for page in pdf_reader.pages:
75+
text += page.extract_text()
76+
pages = [Document(page_content = text)]
77+
return gcs_blob_filename, pages
78+
79+
def upload_file_to_gcs(file_chunk, chunk_number, original_file_name, bucket_name):
80+
storage_client = storage.Client()
81+
82+
file_name = f'{original_file_name}_part_{chunk_number}'
83+
bucket = storage_client.bucket(bucket_name)
84+
file_data = file_chunk.file.read()
85+
# print(f'data after read {file_data}')
5486

87+
blob = bucket.blob(file_name)
88+
file_io = io.BytesIO(file_data)
89+
blob.upload_from_file(file_io)
90+
# Define the lifecycle rule to delete objects after 6 hours
91+
# rule = {
92+
# "action": {"type": "Delete"},
93+
# "condition": {"age": 1} # Age in days (24 hours = 1 days)
94+
# }
95+
96+
# # Get the current lifecycle policy
97+
# lifecycle = list(bucket.lifecycle_rules)
98+
99+
# # Add the new rule
100+
# lifecycle.append(rule)
101+
102+
# # Set the lifecycle policy on the bucket
103+
# bucket.lifecycle_rules = lifecycle
104+
# bucket.patch()
105+
time.sleep(1)
106+
logging.info('Chunk uploaded successfully in gcs')
107+
108+
def merge_file_gcs(bucket_name, original_file_name: str):
109+
storage_client = storage.Client()
110+
# Retrieve chunks from GCS
111+
blobs = storage_client.list_blobs(bucket_name, prefix=f"{original_file_name}_part_")
112+
chunks = []
113+
for blob in blobs:
114+
chunks.append(blob.download_as_bytes())
115+
blob.delete()
116+
117+
# Merge chunks into a single file
118+
merged_file = b"".join(chunks)
119+
blob = storage_client.bucket(bucket_name).blob(original_file_name)
120+
logging.info('save the merged file from chunks in gcs')
121+
file_io = io.BytesIO(merged_file)
122+
blob.upload_from_file(file_io)
123+
pdf_reader = PdfReader(file_io)
124+
file_size = len(merged_file)
125+
total_pages = len(pdf_reader.pages)
126+
127+
return total_pages, file_size
128+
129+
def delete_file_from_gcs(bucket_name, file_name):
130+
try:
131+
storage_client = storage.Client()
132+
bucket = storage_client.bucket(bucket_name)
133+
blob = bucket.blob(file_name)
134+
if blob.exists():
135+
blob.delete()
136+
logging.info('File deleted from GCS successfully')
137+
except:
138+
raise Exception('BLOB not exists in GCS')

0 commit comments

Comments
 (0)