Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-context document support for Anthropic and Google models #5130

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

alex-torregrosa
Copy link
Contributor

@alex-torregrosa alex-torregrosa commented Dec 29, 2024

Summary

Added support for sending full documents in the chat context for Anthropic and Google endpoints.

New available types:

  • Anthropic: PDF files
  • Google: text documents, PDF files, audio files

Limitations:

  • Not working with Agents, as langchain does not support pdf files.
  • Token count prediction with anthropic is wrong, as with the current dependencies we can't get the PDF page count.

Feel free to suggest any changes or anything else that might be needed.

Change Type

  • New feature (non-breaking change which adds functionality)

Testing

Tested both the Anthropic & Google endpoints uploading PDF files and requesting a summary. Same with audio files on the Google endpoint.

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • My changes do not introduce new warnings

@danny-avila
Copy link
Owner

Thanks for working on this, but it should not be available only when RAG_API_URL is not set.

I am going to work on this soon. My vision for this is adding it to the current list of file upload options:
firefox_air0ZiQ3Gl

@danny-avila danny-avila marked this pull request as draft December 29, 2024 15:00
@alex-torregrosa
Copy link
Contributor Author

Ok! I can update the MR to do something similar reusing the AttachFileMenu for non-agent endpoints.

Maybe something like this? (with the right icon)
image

On the backend side, the requests could be differentiated by marking RAG files with EToolResources.file_search.

I'm happy to start working on these changes, or if you'd prefer to handle it, that's fine as well. Let me know what you'd like to do.

@danny-avila
Copy link
Owner

@alex-torregrosa thank you that looks good, only small nitpick maybe "Upload to Provider" makes more sense.

On the backend side, the requests could be differentiated by marking RAG files with EToolResources.file_search.

Sounds good to me. I'm happy to let you work on it.

Note that I was planning on migrating all the file options to every non-agents endpoint, and then start using "ephemeral" agents when certain options/features are selected. The plan is to eventually migrate all backend chat operations to agents, as there is improved performance there and extends capabilities for tool use (which is often assumed as it is for ChatGPT, as is the motivation for this change).

I can work on that after your changes, though, as I can see that being iterative over the current scope.

@alex-torregrosa
Copy link
Contributor Author

Done!

image

Now the menu will show "Upload to provider" for providers that support more file types than images, and "Upload image" otherwise:
image

Some rough file filtering is also done per-model, similar to the old image/* one.

@alex-torregrosa alex-torregrosa marked this pull request as ready for review December 30, 2024 16:51
Reused the image handling functions, as the mime type was already being
set correctly.

For now the upload only happens if RAG is disabled. A better approach
would be to re-use the agents menu for deciding context vs RAG
Will be generalized for any file type, not just images
Removed as much references to images as possible
Sets mime type filters for the Google and Anthropic endpoints according
to the documents that can be attached in the chat context. Also brings
support to upload audio files directly to Gemini.
Added support for attaching documents to claude 3.5
AttachFileMenu is now used for all endpoints, and RAG file uploads are
sent with the `file_search` tool.

Based on the tool, the upload handler can now select between vectorDB or
local files, without need to check for an empty `RAG_API_URL`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants