-
Notifications
You must be signed in to change notification settings - Fork 221
get graph service to work with images #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mrge found 4 issues across 3 files. View them in mrge.io
@@ -366,7 +366,7 @@ async def batch_retrieve_chunks( | |||
auth: AuthContext, | |||
folder_name: Optional[str] = None, | |||
end_user_id: Optional[str] = None, | |||
use_colpali: Optional[bool] = None, | |||
retrieve_images: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameter 'retrieve_images' creates inconsistency with similar parameter 'use_colpali' used in other methods
system_msg, _ = dummy.captured_messages | ||
assert system_msg['content'].startswith(expected_system_prefix) | ||
assert entities and entities[0].label == "TestEntity" | ||
assert relationships and relationships[0].type == "related_to" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attribute error: accessing 'type' but RelationshipExtraction uses 'relationship' attribute
|
||
if is_image: | ||
# For images, add the image as a content block | ||
user_message_content.append({"type": "image_url", "image_url": {"url": content_limited}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Base64 image content is incorrectly passed as a URL instead of a data URI when constructing the message for the LLM
# We assume the custom prompt handles incorporating text and image content appropriately. | ||
# For simplicity, we'll just pass the original content_limited and examples_str | ||
# to the custom prompt formatter. The user is responsible for formatting in the template. | ||
formatted_user_text = custom_prompt.format(content=content_limited, examples=examples_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom prompt handling doesn't account for multimodal image content
@@ -545,7 +549,7 @@ async def _process_documents_for_entities( | |||
|
|||
# Extract entities and relationships from the chunk | |||
chunk_entities, chunk_relationships = await self.extract_entities_from_text( | |||
chunk.content, chunk.document_id, chunk.chunk_number, extraction_overrides | |||
chunk.content, chunk.document_id, chunk.chunk_number, extraction_overrides, override_is_image=chunk.metadata.get("is_image", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change the method name to extract entities
if is_image: | ||
content_limited = content | ||
else: | ||
content_limited = content[: min(len(content), 5000)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could remove this
else: | ||
# For text, use standard extraction instructions | ||
system_content = ( | ||
"You are an entity extraction and relationship extraction assistant. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the original system prompt pls! I tested it and works well. Also maybe for images, we could just append saying context is image. That should do the job.
# For simplicity, we'll just pass the original content_limited and examples_str | ||
# to the custom prompt formatter. The user is responsible for formatting in the template. | ||
formatted_user_text = custom_prompt.format(content=content_limited, examples=examples_str) | ||
user_message = {"role": "user", "content": formatted_user_text} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for custom prompts, we should add the image!!!
No description provided.